Closed picatz closed 6 years ago
@picatz what is this \xD1
at the end of he first string? Should it be there?
Another point: when parsing, packetgen forces binary encoding. If you expect UTF-8, you may force encoding of TCP body:
tcp_pkt_body.force_encoding('UTF-8')
But i am doubtful this will solve your problem, as there is only on byte out of encoding.
@sdaubert The \xD1
is actually from a real-world TCP packet body I sniffed using PacketGen.
I've been trying to collect examples of failed packets I've noticed while capturing, and the example came up during that collection.
As for #force_encoding
, that method doesn't seem to solve the problem -- even though I wish it did.
tcp_pkt_body.force_encoding('UTF-8').valid_encoding?
# => false
🤷♂️ But, as I showed before, this works:
tcp_pkt_body.chars.select(&:valid_encoding?).join.valid_encoding?
# => true
For whatever reason, I ended up writing my own HTTP parser. So, of course, there would be bugs. 😂
I need to submit a fix to handle these kinds of cases. It could look something like this:
Unless @sdaubert has a better way of handling this invalid encoding problem.