Closed kpribic closed 4 years ago
Hi Kosta, in heplify-server I use a fixed const maxPktLen = 8192 for packets. Maybe I should make this configurable.
Maybe you can use a different filter string like:
-fi CSeq
Implementing a HEP type filter shouldn't be a problem too. For example just pass HEP types 1, 5 to let SIP and RTCP go through.
Hey Eugen, thanks for the fast reponse! :-)
maxPktLen increase maybe could help but I don't know how big those responses might get. Maybe that isn't practical for this particular issue.
I tried CSeq and of course it is mentioned somewhere in the responses so it doesn't help.
HEP filter would probably help. Wireshark dissector says "Protocol Type: Unknown" on the packet that broke the connection, while SIP packets have "Protocol Type: SIP"
Or maybe I can just ignore it on heplify-server side without sending an RST packet.
That also makes sense, heplify-server shouldn't reject connections because of one bad payload. If it were UDP it couldn't drop it. For completely malformed packet (invalid HEP header) it would make sense to drop the connections.
Released a new heplify-server version which won't drop TCP conn for these packets.
Something weird happened with heplify-server when I upgraded to newest version. This happened to me before this week and restart of heplify-server helped. Now restart isn't helping.
Hepligy-server is running through logs at rate of about one logfile per second, thats ~100.000 error lines per second
# ls -l --time-style=full-iso | grep heplify-server.log
-rw------- 1 root root 6376950 2020-02-20 08:00:35.928041957 +0000 heplify-server.log
-rw------- 1 root root 10485975 2020-02-20 08:00:35.152026812 +0000 heplify-server.log.1
-rw------- 1 root root 10485675 2020-02-20 08:00:33.818000777 +0000 heplify-server.log.2
-rw------- 1 root root 10485825 2020-02-20 08:00:32.440973902 +0000 heplify-server.log.3
-rw------- 1 root root 10485825 2020-02-20 08:00:31.088947516 +0000 heplify-server.log.4
-rw------- 1 root root 10485900 2020-02-20 08:00:29.747921353 +0000 heplify-server.log.5
-rw------- 1 root root 10485750 2020-02-20 08:00:28.352894145 +0000 heplify-server.log.6
# cat heplify-server.log* | wc -l
936962
And everything is full of:
# head heplify-server.log
2020-02-20T07:56:51Z WARN wrong or too big HEP packet size with 8243 bytes
2020-02-20T07:56:51Z WARN wrong or too big HEP packet size with 8241 bytes
2020-02-20T07:56:51Z WARN wrong or too big HEP packet size with 8243 bytes
2020-02-20T07:56:51Z WARN wrong or too big HEP packet size with 8243 bytes
2020-02-20T07:56:51Z WARN wrong or too big HEP packet size with 8243 bytes
2020-02-20T07:56:51Z WARN wrong or too big HEP packet size with 8241 bytes
2020-02-20T07:56:51Z WARN wrong or too big HEP packet size with 8241 bytes
2020-02-20T07:56:51Z WARN wrong or too big HEP packet size with 8241 bytes
2020-02-20T07:56:51Z WARN wrong or too big HEP packet size with 8241 bytes
2020-02-20T07:56:51Z WARN wrong or too big HEP packet size with 8241 bytes
However there simply aren't enough long packets to cause this behavior, here is a 15sec trace of all TCP traffic on that machine filtered for tcp size over 8200:
(I checked later just in case - there is no UDP traffic on port 9060 at all)
Total traffic flow on that machine is about 1456 packets - it simply isn't enough to generate that amount of errors at that rate even if EVERY packet is wrong.
# time tcpdump -i any -w /tmp/heplify-trace2.pcap
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
^C8505 packets captured
13112 packets received by filter
394 packets dropped by kernel
real 0m9.005s
user 0m0.025s
sys 0m0.577s
If I filter for tcp.len > 4000
there are only 20 packets more.
This looks like some kind of bug but I can't figure out what is triggering it.
Okay let me check. In the meantime I could create a new branch and rename it to diskeraser :D
Ok please check out latest heplify-server release. I did not reset the underlying buffer this triggered a tight loop.
Similar behavior, again rolls through logs but with different error:
# ./heplify-server --version
VERSION: heplify-server 1.17
# head heplify-server.log.4
2020-02-20T09:43:40Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
2020-02-20T09:43:40Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
2020-02-20T09:43:40Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
2020-02-20T09:43:40Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
2020-02-20T09:43:40Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
2020-02-20T09:43:40Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
2020-02-20T09:43:40Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
2020-02-20T09:43:40Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
2020-02-20T09:43:40Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
2020-02-20T09:43:40Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
This is what it looked like before it went completely haywire
[...]
2020-02-20T09:42:54Z WARN wrong or too big HEP packet size with 8241 bytes
2020-02-20T09:42:54Z WARN wrong or too big HEP packet size with 8804 bytes
2020-02-20T09:42:54Z WARN wrong or too big HEP packet size with 14641 bytes
[...]
2020-02-20T09:42:56Z WARN parseStartLine err: received err while parsing start line: parseStartLineRequest err: could not get "/" pos in parts[2]
[ edit: bunch of json here]
nodeID: 2002, protoType: 1, version: 2, protocol: 6, length: 7240, flow: 5.5.5.5:5060->6.6.6.6:53006
2020-02-20T09:42:56Z WARN ParseMsg: err parsing msg no SIP eof found
[ edit: bunch of json here]
nodeID: 2002, protoType: 1, version: 2, protocol: 6, length: 3763, flow: 5.5.5.5:5060->6.6.6.6:53006
[...]
2020-02-20T09:42:56Z WARN wrong or too big HEP packet size with 8243 bytes
2020-02-20T09:42:56Z WARN wrong or too big HEP packet size with 8819 bytes
2020-02-20T09:42:56Z WARN wrong or too big HEP packet size with 29507 bytes
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 8243 bytes
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 14391 bytes
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 25970 bytes
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 12599 bytes
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 25189 bytes
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 26469 bytes
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 12846 bytes
2020-02-20T09:42:57Z WARN malformed packet with length 2654 which is neither hep nor protobuf encapsulated
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 13875 bytes
2020-02-20T09:42:57Z WARN malformed packet with length 2067 which is neither hep nor protobuf encapsulated
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 26979 bytes
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 25142 bytes
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 12846 bytes
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 24192 bytes
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 29793 bytes
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 12336 bytes
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 12336 bytes
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 15931 bytes
2020-02-20T09:42:57Z WARN wrong or too big HEP packet size with 16128 bytes
2020-02-20T09:42:57Z WARN malformed packet with length 2048 which is neither hep nor protobuf encapsulated
2020-02-20T09:42:57Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
2020-02-20T09:42:57Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
2020-02-20T09:42:57Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
2020-02-20T09:42:57Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
2020-02-20T09:42:57Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
2020-02-20T09:42:57Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
2020-02-20T09:42:57Z WARN malformed packet with length 0 which is neither hep nor protobuf encapsulated
Hmm I tested it quickly and looked good with my testcases mby I'm missing one. Maybe I revert the change for now till I have more time for testing and change heplify in a way that it won't send out unknown hep types.
That's probably the safe route :-) For backward compatibility that should probably be activated via flag. Also, do you notice that heplify doesn't have a --version flag?
Ok let's go this way because spending too much time on this isn't possible atm for me. Ok, let me add this flag.
Ok added new heplify release with version flag. Currently I did not made a flag for the unkown HEP packets because they won't be handled in heplify-server anyway. Maybe next time. Keep in mind that heplify client won't parse the full SIP packet so maybe some packets will still tagged as SIP because they have a cseq string inside.
Unfortunately this isn't fixing my specific problem because that jsonrpc for some reason contains CSeq sometimes. I'm starting to thing this is a wrong approach for this particular issue - it's getting very complicated. I'll try to separate this somehow on Kamailio side.
Anyways, thanks for the time&effort Eugen!
Ok ;) TBH it isn't really complicated but since TCP is involved it needs more careful thinking and tests which requires some time. Maybe we can come back to this during Kamailio World. I hope to see you there!
TCP looks simple at first but edge cases somehow multiply when you start poking around :-) I do plan to come to Kamailio world this year as well so it would be really nice to meet up!
Sorry all TCP traces I have specifically json part are full of private data so I really can't send them nor anonymize them in any practical way. I'll see if I can reproduce the case with some dummy data, but this will take a while to setup
I fully understand. Yeah that would be helpful and I think I will make it mandatory for every new decoding related issue.
Can heplify simply drop content it can't parse instead of forwarding it as HEP payload?
TLDR explanation why I'm asking:
I have a kamailio cluster using jsonrpc server module. Unfortunately that module, if you want to use http, runs on same port as kamailio - 5060 tcp. This is really annoying.
Default configuration heplify (1.55) will forward fragmented jsonrpc messages and SIP, heplify-server will complain that it can't parse jsonrpc packets but it will generally work. However after a while heplify-server will conclude a packet is TLS (?) packet and reset connection. This appears to happen when packet is large ~8kbyte but not always. It's basically a HEP3 TCP packet with a ton of json and appears valid otherwise.
Heplify-server sends a RST packet and connection gets stuck in CLOSE_WAIT state according to etstat on heplify side. Since the connection is stuck tracing obviously won't work for a while. Then it retries and reconnects eventually.
heplify log:
Heplify-server log:
I tried to avoid this issue:
Attempt 1:
-fi SIP/2.0
This works well for eliminating jsonRPC but for some reason drops part of SIP traffic, for example I'm missing some INVITE messages. Can't figure out why.Atempt 2:
tcpassembly
and-di jsonrpc
This works well. Tcp assembly is required because otherwise I drop only first part of jsonrpc that is larger than 8kbyte and after a while the trailing part causes the same issue. However this introduces a slight delay in tcp processing and when I have ingress TCP but egress UDP it will result in wrong order of messages, TCP messages will appear to come after UDP while it's other way around actually. I suppose upgrading everything to TCP would help but that's not an option for me right now.