rsmmr / hilti

**NOTE**: This is outdated and no longer maintained. There's a new version at https://github.com/zeek/spicy.
Other
40 stars 22 forks source link

Bug in SPICY match() Method :: Does Not Match Extended ASCII Characters #41

Open mfrndz opened 7 years ago

mfrndz commented 7 years ago

I am developing a protocol analyzer using Spicy, using the BRO/HILTI/SPICY docker image posted at the following URL: https://hub.docker.com/r/rsmmr/hilti/

In my Spicy protocol analyzer, I am trying to match a pattern. My data type is ‘bytes’, and I am using the ‘match()’ method. If the pattern includes an extended ASCII character (range from 0x80 to 0xFF), then the pattern fails to find a match. However, if I wildcard the extended ASCCI character, then it finds a match. While I cannot share source code from my original project, I created a sample project to demonstrate the bug.

I will upload the following sample files so that you may attempt to reproduce the bug: (1) regex_test.spicy (2) regex_test.evt (3) regex_test.bro* (4) smb-browser-elections.pcap

NOTE : There might be CR/LF issues because I saved these files with Notepad on a Windows box. Also, I had to give each file a .txt extension, in order for the file uploader to accept it.

As my sample data, I downloaded an SMB pcap file from wireshark.org. The regular expression patterns below are based on Frame #3, NetBIOS/SMB datagram, in the SMB pcap file 'smb-browser-elections.pcap' downloaded from the wireshark website, at the following URL: https://wiki.wireshark.org/SampleCaptures?action=AttachFile&do=get&target=smb-browser-elections.pcapng

Here are my sample regex patterns**:

//# Appears at offset 0x2A in Frame 3 //# or offset 0x00 within UDP payload const SmbRegEx_1a = /^\x11\x02.\x16/; const SmbRegEx_1b = /^\x11\x02\x82\x16/;

//# Appears at offset 0x2D in Frame 3 //# or offset 0x03 within UDP payload const SmbRegEx_2a = /\x16..\x7B/; const SmbRegEx_2b = /\x16\xC0\xA8\x7B/;

Patterns _1a and _2a match successfully, because they include the wildcard in place of the extended ASCII character(s).

Patterns _1b and _2b do not match, because they contain offending character(s) in the extended ASCII range.

NOTE **: my Spicy source code contains a third regex pattern, shown below:

//# Appears at offset 0x78 in Frame 3 //# or offset 0x4E within UDP payload const SmbRegEx_3a = /\x43\x41\x42\x00.\x53\x4D\x42/; const SmbRegEx_3b = /\x43\x41\x42\x00\xFF\x53\x4D\x42/;

Interestingly, this pattern fails to match for both _3a and _3b. I would expect _3a to match because it contains the wildcard. Not sure what is going wrong with this pattern. Is there a certain depth/limit at which the match() method will stop searching?

Cheers! Mark

regex_test.bro.txt regex_test.evt.txt regex_test.spicy.txt smb-browser-elections.pcap.txt

kmcmahon1959 commented 6 years ago

I updated jrx.c and jrx.h (see attached) locally to change a few of the buffer variables to unsigned and it resolves this issue. I'm not entirely sure why (spent a bit of time trying to narrow this down), but the match-test.spicy file (also attached) works with this change. Also note, this now causes the build to warn about converting between pointers with different signed attributes (I tried to resolve this as well, but it requires massive changes to buffer pointers throughout the code). jrx.c.txt jrx.h.txt match-test.spicy.txt

kmcmahon1959 commented 6 years ago

jrx.c-diff.txt jrx.h-diff.txt

Diffs for jrx.c and jrx.h attached.