sipcapture / heplify

Portable and Lightweight HEP Capture Agent for HOMER
https://sipcapture.org
GNU Affero General Public License v3.0
185 stars 65 forks source link

tcp reassembly / sip-stream following buggy, misses messages #168

Closed xhantu closed 3 years ago

xhantu commented 4 years ago

Hi,

I noticed that captures of sip over tcp will sometimes miss messages, even with tcp assembly switched on. When looking through the sources I noticed that the sip extraction from the tcp stream seems to be extremely simplified and relying on some assumptions that are not always true. I see the following problems:

If a sip/http/websocket message is detected, all data is consumed, not just the first message. This assumes that the collected data contains exactly one sip message, nothing more.

A sip message is only detected if the collected data contains exactly one sip message. This is only the case if a large enough pause between sending messages is used, so that reading from the stream will return after a message.

Both together can be problematic if the sender combines parts of multiple (provisional) messages into one packet or the goroutine scheduling does not schedule the stream reader after the message was finished and before the new one starts. If the stream reading only once misses the border between messages, it will never detect a message again, but will collect everything in its collection buffer until the stream is closed.

Reading sip, http and websocket is mixed together, but changing between protocols in a stream is limited.

I propose the following changes:

I had started a try to do this with a state machine, but have currently no time to pursue this further.

lmangani commented 4 years ago

Are you using af_packet or pcap sockets?

xhantu commented 4 years ago

Currently af_packet is used.

negbie commented 4 years ago

@xhantu you have good eyes and a sharp mind ;) All of your points are correct! There are a LOT of shortcomings in the current implementation. I very quickly made it without heavy testing and there is even a race condition inside it currently. Besides that there are still some known issues inside the gopacket lib for tcpassembly aswell. It would take too long to explain them in depth but you can check it under the gopacket repo. Your suggested changes would help and would be a good starting point but it would still need more work to detect some corner cases where some client's don't behave correctly.

If I will find some time I will check this but I don't think that it will be this year ;)

xhantu commented 4 years ago

@negbie thanks for the praise :) If you know some corner cases that should be considered feel free to describe them in this issue. Maybe I will find some time too fix at least some of the problems. Working tcp capturing is one of the things I would like to have around end of the year. I do not need SIP over websockets, but as somebody probably uses that feature I would not want to break it more than it is already.

negbie commented 4 years ago

One corner case is that you need to make sure that RST packets close the stream. You can fix this by calling FlushAll more often but this is error prone and could create race conditions depending on the implementation. Check gopacket issue 425. Another corner case is around selective ACKs. Check gopacket issue 81. Some corner cases are around malfromed SIP messages itself. It's questionable to handle them but as you might know in SIP everything is possible :D

negbie commented 4 years ago

@xhantu I added PACKET_FANOUT_HASH with the DEFRAG flag to af_packet. Instead of David here: https://github.com/david415/HoneyBadger/issues/64 I didn't do it because of performance. Performance with -fg and -fw set should be even a bit worse but it should make your life easier to deal with TCP reassembly.

With this each TCP flow is enqueued onto only one socket in the group. You can specify the count with the -fw flag and the group ID with the -fg flag.

The DEFRAG options is needed because when a TCP/UDP packet gets fragmented, only the first packet has the 'original' header in it. This breaks a 5-tuple hashing function, assuming the IP reassembly happens after the hashing. The reassembly is happening within the context of the decode thread, so the subsequent reassembled packets are going to go to a different thread than the first one and incrementing this counter.

There are some af_packet gotchas you should know about: http://forums.trisul.org/d/27-af-packet-gotchas

lmangani commented 3 years ago

Closing. Please reopen if anyone still are experiencing this issue with the latest available versions.