lemontree55 / packetgen

Ruby library to easily generate and capture network packets
MIT License
98 stars 13 forks source link

http parser can't handle invalid byte sequences #78

Closed picatz closed 6 years ago

picatz commented 6 years ago

For whatever reason, I ended up writing my own HTTP parser. So, of course, there would be bugs. 😂

I need to submit a fix to handle these kinds of cases. It could look something like this:

tcp_pkt_body =  "GET /wiggle HTTP/1.1\r\nHost: ocsp.comodoca.com\r\nConnection: close\r\nUser-Agent: wiggle\r\n\r\n\r\xD1"

# if I do this
/(CONNECT|DELETE|GET|HEAD|OPTIONS|PATCH|POST|PUT)/ =~ tcp_pkt_body
# raises error: ArgumentError: invalid byte sequence in UTF-8

# so I need to cleanse it
tcp_pkt_body = tcp_pkt_body.chars.select(&:valid_encoding?).join
# => "GET /wiggle HTTP/1.1\r\nHost: ocsp.comodoca.com\r\nConnection: close\r\nUser-Agent: wiggle\r\n\r\n\r"

# and now I can regex it to get a match
/(CONNECT|DELETE|GET|HEAD|OPTIONS|PATCH|POST|PUT)/ =~ tcp_pkt_body
# => 0

Unless @sdaubert has a better way of handling this invalid encoding problem.

sdaubert commented 6 years ago

@picatz what is this \xD1 at the end of he first string? Should it be there?

Another point: when parsing, packetgen forces binary encoding. If you expect UTF-8, you may force encoding of TCP body:

tcp_pkt_body.force_encoding('UTF-8')

But i am doubtful this will solve your problem, as there is only on byte out of encoding.

picatz commented 6 years ago

@sdaubert The \xD1 is actually from a real-world TCP packet body I sniffed using PacketGen.

I've been trying to collect examples of failed packets I've noticed while capturing, and the example came up during that collection.

As for #force_encoding, that method doesn't seem to solve the problem -- even though I wish it did.

tcp_pkt_body.force_encoding('UTF-8').valid_encoding?
# => false

🤷‍♂️ But, as I showed before, this works:

tcp_pkt_body.chars.select(&:valid_encoding?).join.valid_encoding?
# => true