sipcapture / heplify

Portable and Lightweight HEP Capture Agent for HOMER
https://sipcapture.org
GNU Affero General Public License v3.0
185 stars 66 forks source link

how to drop invalid traffic #155

Closed kpribic closed 4 years ago

kpribic commented 4 years ago

Can heplify simply drop content it can't parse instead of forwarding it as HEP payload?

TLDR explanation why I'm asking:

I have a kamailio cluster using jsonrpc server module. Unfortunately that module, if you want to use http, runs on same port as kamailio - 5060 tcp. This is really annoying.

Default configuration heplify (1.55) will forward fragmented jsonrpc messages and SIP, heplify-server will complain that it can't parse jsonrpc packets but it will generally work. However after a while heplify-server will conclude a packet is TLS (?) packet and reset connection. This appears to happen when packet is large ~8kbyte but not always. It's basically a HEP3 TCP packet with a ton of json and appears valid otherwise.

Heplify-server sends a RST packet and connection gets stuck in CLOSE_WAIT state according to etstat on heplify side. Since the connection is stuck tracing obviously won't work for a while. Then it retries and reconnects eventually.

heplify log:

2020/02/19 10:02:26.866824 hep.go:95: ERR write tcp 6.6.6.6:47290->5.5.5.5:9060: write: connection timed out
2020/02/19 10:02:27.292024 hep.go:95: ERR write tcp 6.6.6.6:48318->5.5.5.5:9060: write: connection reset by peer
2020/02/19 10:02:27.714622 hep.go:95: ERR write tcp 6.6.6.6:48320->5.5.5.5:9060: write: broken pipe

Heplify-server log:

2020-02-19T10:11:54Z WARN unexpected packet, did you send TLS into plain TCP input?
2020-02-19T10:11:54Z INFO closing TCP connection from 6.6.6.6:37022
2020-02-19T10:11:54Z INFO new TCP connection 6.6.6.6:37030 -> 5.5.5.5:9060

I tried to avoid this issue:

Attempt 1: -fi SIP/2.0 This works well for eliminating jsonRPC but for some reason drops part of SIP traffic, for example I'm missing some INVITE messages. Can't figure out why.

Atempt 2: tcpassembly and -di jsonrpc This works well. Tcp assembly is required because otherwise I drop only first part of jsonrpc that is larger than 8kbyte and after a while the trailing part causes the same issue. However this introduces a slight delay in tcp processing and when I have ingress TCP but egress UDP it will result in wrong order of messages, TCP messages will appear to come after UDP while it's other way around actually. I suppose upgrading everything to TCP would help but that's not an option for me right now.

negbie commented 4 years ago

Hi Kosta, in heplify-server I use a fixed const maxPktLen = 8192 for packets. Maybe I should make this configurable.

Maybe you can use a different filter string like:

-fi CSeq

Implementing a HEP type filter shouldn't be a problem too. For example just pass HEP types 1, 5 to let SIP and RTCP go through.

kpribic commented 4 years ago

Hey Eugen, thanks for the fast reponse! :-)

maxPktLen increase maybe could help but I don't know how big those responses might get. Maybe that isn't practical for this particular issue.

I tried CSeq and of course it is mentioned somewhere in the responses so it doesn't help.

HEP filter would probably help. Wireshark dissector says "Protocol Type: Unknown" on the packet that broke the connection, while SIP packets have "Protocol Type: SIP"

image

negbie commented 4 years ago

Or maybe I can just ignore it on heplify-server side without sending an RST packet.

kpribic commented 4 years ago

That also makes sense, heplify-server shouldn't reject connections because of one bad payload. If it were UDP it couldn't drop it. For completely malformed packet (invalid HEP header) it would make sense to drop the connections.

negbie commented 4 years ago

Released a new heplify-server version which won't drop TCP conn for these packets.

https://github.com/sipcapture/heplify-server/releases

kpribic commented 4 years ago

Something weird happened with heplify-server when I upgraded to newest version. This happened to me before this week and restart of heplify-server helped. Now restart isn't helping.

Hepligy-server is running through logs at rate of about one logfile per second, thats ~100.000 error lines per second

# ls -l --time-style=full-iso | grep heplify-server.log
-rw-------  1 root root  6376950 2020-02-20 08:00:35.928041957 +0000 heplify-server.log
-rw-------  1 root root 10485975 2020-02-20 08:00:35.152026812 +0000 heplify-server.log.1
-rw-------  1 root root 10485675 2020-02-20 08:00:33.818000777 +0000 heplify-server.log.2
-rw-------  1 root root 10485825 2020-02-20 08:00:32.440973902 +0000 heplify-server.log.3
-rw-------  1 root root 10485825 2020-02-20 08:00:31.088947516 +0000 heplify-server.log.4
-rw-------  1 root root 10485900 2020-02-20 08:00:29.747921353 +0000 heplify-server.log.5
-rw-------  1 root root 10485750 2020-02-20 08:00:28.352894145 +0000 heplify-server.log.6
# cat heplify-server.log* | wc -l
936962

And everything is full of:

# head heplify-server.log          
2020-02-20T07:56:51Z WARN wrong or too big HEP packet size with 8243 bytes
2020-02-20T07:56:51Z WARN wrong or too big HEP packet size with 8241 bytes
2020-02-20T07:56:51Z WARN wrong or too big HEP packet size with 8243 bytes
2020-02-20T07:56:51Z WARN wrong or too big HEP packet size with 8243 bytes
2020-02-20T07:56:51Z WARN wrong or too big HEP packet size with 8243 bytes
2020-02-20T07:56:51Z WARN wrong or too big HEP packet size with 8241 bytes
2020-02-20T07:56:51Z WARN wrong or too big HEP packet size with 8241 bytes
2020-02-20T07:56:51Z WARN wrong or too big HEP packet size with 8241 bytes
2020-02-20T07:56:51Z WARN wrong or too big HEP packet size with 8241 bytes
2020-02-20T07:56:51Z WARN wrong or too big HEP packet size with 8241 bytes

However there simply aren't enough long packets to cause this behavior, here is a 15sec trace of all TCP traffic on that machine filtered for tcp size over 8200:

image

(I checked later just in case - there is no UDP traffic on port 9060 at all)

Total traffic flow on that machine is about 1456 packets - it simply isn't enough to generate that amount of errors at that rate even if EVERY packet is wrong.

# time tcpdump -i any -w /tmp/heplify-trace2.pcap
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
^C8505 packets captured
13112 packets received by filter
394 packets dropped by kernel

real    0m9.005s
user    0m0.025s
sys     0m0.577s

If I filter for tcp.len > 4000 there are only 20 packets more.

This looks like some kind of bug but I can't figure out what is triggering it.

negbie commented 4 years ago

Okay let me check. In the meantime I could create a new branch and rename it to diskeraser :D

negbie commented 4 years ago

Ok please check out latest heplify-server release. I did not reset the underlying buffer this triggered a tight loop.

kpribic commented 4 years ago

Similar behavior, again rolls through logs but with different error:

# ./heplify-server --version
VERSION: heplify-server 1.17
# head heplify-server.log.4
2020-02-20T09:43:40Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
2020-02-20T09:43:40Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
2020-02-20T09:43:40Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
2020-02-20T09:43:40Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
2020-02-20T09:43:40Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
2020-02-20T09:43:40Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
2020-02-20T09:43:40Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
2020-02-20T09:43:40Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
2020-02-20T09:43:40Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
2020-02-20T09:43:40Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
kpribic commented 4 years ago

This is what it looked like before it went completely haywire

[...]
2020-02-20T09:42:54Z WARN wrong or too big HEP packet size with 8241 bytes
2020-02-20T09:42:54Z WARN wrong or too big HEP packet size with 8804 bytes
2020-02-20T09:42:54Z WARN wrong or too big HEP packet size with 14641 bytes
[...]
2020-02-20T09:42:56Z WARN parseStartLine err: received err while parsing start line: parseStartLineRequest err: could not get "/" pos in parts[2]
    [ edit: bunch of json here]
nodeID: 2002, protoType: 1, version: 2, protocol: 6, length: 7240, flow: 5.5.5.5:5060->6.6.6.6:53006

2020-02-20T09:42:56Z WARN ParseMsg: err parsing msg no SIP eof found
   [ edit: bunch of json here]
nodeID: 2002, protoType: 1, version: 2, protocol: 6, length: 3763, flow: 5.5.5.5:5060->6.6.6.6:53006
[...]
2020-02-20T09:42:56Z WARN wrong or too big HEP packet size with 8243 bytes
2020-02-20T09:42:56Z WARN wrong or too big HEP packet size with 8819 bytes
2020-02-20T09:42:56Z WARN wrong or too big HEP packet size with 29507 bytes
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 8243 bytes
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 14391 bytes
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 25970 bytes
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 12599 bytes
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 25189 bytes
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 26469 bytes
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 12846 bytes
2020-02-20T09:42:57Z WARN malformed packet with length 2654 which is neither hep nor protobuf encapsulated
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 13875 bytes
2020-02-20T09:42:57Z WARN malformed packet with length 2067 which is neither hep nor protobuf encapsulated
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 26979 bytes
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 25142 bytes
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 12846 bytes
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 24192 bytes
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 29793 bytes
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 12336 bytes
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 12336 bytes
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 15931 bytes
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 16128 bytes
2020-02-20T09:42:57Z WARN malformed packet with length 2048 which is neither hep nor protobuf encapsulated
2020-02-20T09:42:57Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
2020-02-20T09:42:57Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
2020-02-20T09:42:57Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
2020-02-20T09:42:57Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
2020-02-20T09:42:57Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
2020-02-20T09:42:57Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
2020-02-20T09:42:57Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
negbie commented 4 years ago

Hmm I tested it quickly and looked good with my testcases mby I'm missing one. Maybe I revert the change for now till I have more time for testing and change heplify in a way that it won't send out unknown hep types.

kpribic commented 4 years ago

That's probably the safe route :-) For backward compatibility that should probably be activated via flag. Also, do you notice that heplify doesn't have a --version flag?

negbie commented 4 years ago

Ok let's go this way because spending too much time on this isn't possible atm for me. Ok, let me add this flag.

negbie commented 4 years ago

Ok added new heplify release with version flag. Currently I did not made a flag for the unkown HEP packets because they won't be handled in heplify-server anyway. Maybe next time. Keep in mind that heplify client won't parse the full SIP packet so maybe some packets will still tagged as SIP because they have a cseq string inside.

kpribic commented 4 years ago

Unfortunately this isn't fixing my specific problem because that jsonrpc for some reason contains CSeq sometimes. I'm starting to thing this is a wrong approach for this particular issue - it's getting very complicated. I'll try to separate this somehow on Kamailio side.

Anyways, thanks for the time&effort Eugen!

negbie commented 4 years ago

Ok ;) TBH it isn't really complicated but since TCP is involved it needs more careful thinking and tests which requires some time. Maybe we can come back to this during Kamailio World. I hope to see you there!

kpribic commented 4 years ago

TCP looks simple at first but edge cases somehow multiply when you start poking around :-) I do plan to come to Kamailio world this year as well so it would be really nice to meet up!

kpribic commented 4 years ago

Sorry all TCP traces I have specifically json part are full of private data so I really can't send them nor anonymize them in any practical way. I'll see if I can reproduce the case with some dummy data, but this will take a while to setup

negbie commented 4 years ago

I fully understand. Yeah that would be helpful and I think I will make it mandatory for every new decoding related issue.