Open mfrndz opened 7 years ago
I updated jrx.c and jrx.h (see attached) locally to change a few of the buffer variables to unsigned and it resolves this issue. I'm not entirely sure why (spent a bit of time trying to narrow this down), but the match-test.spicy file (also attached) works with this change. Also note, this now causes the build to warn about converting between pointers with different signed attributes (I tried to resolve this as well, but it requires massive changes to buffer pointers throughout the code). jrx.c.txt jrx.h.txt match-test.spicy.txt
Diffs for jrx.c and jrx.h attached.
I am developing a protocol analyzer using Spicy, using the BRO/HILTI/SPICY docker image posted at the following URL: https://hub.docker.com/r/rsmmr/hilti/
In my Spicy protocol analyzer, I am trying to match a pattern. My data type is ‘bytes’, and I am using the ‘match()’ method. If the pattern includes an extended ASCII character (range from 0x80 to 0xFF), then the pattern fails to find a match. However, if I wildcard the extended ASCCI character, then it finds a match. While I cannot share source code from my original project, I created a sample project to demonstrate the bug.
I will upload the following sample files so that you may attempt to reproduce the bug: (1) regex_test.spicy (2) regex_test.evt (3) regex_test.bro* (4) smb-browser-elections.pcap
NOTE : There might be CR/LF issues because I saved these files with Notepad on a Windows box. Also, I had to give each file a .txt extension, in order for the file uploader to accept it.
As my sample data, I downloaded an SMB pcap file from wireshark.org. The regular expression patterns below are based on Frame #3, NetBIOS/SMB datagram, in the SMB pcap file 'smb-browser-elections.pcap' downloaded from the wireshark website, at the following URL: https://wiki.wireshark.org/SampleCaptures?action=AttachFile&do=get&target=smb-browser-elections.pcapng
Here are my sample regex patterns**:
//# Appears at offset 0x2A in Frame 3 //# or offset 0x00 within UDP payload const SmbRegEx_1a = /^\x11\x02.\x16/; const SmbRegEx_1b = /^\x11\x02\x82\x16/;
//# Appears at offset 0x2D in Frame 3 //# or offset 0x03 within UDP payload const SmbRegEx_2a = /\x16..\x7B/; const SmbRegEx_2b = /\x16\xC0\xA8\x7B/;
Patterns _1a and _2a match successfully, because they include the wildcard in place of the extended ASCII character(s).
Patterns _1b and _2b do not match, because they contain offending character(s) in the extended ASCII range.
NOTE **: my Spicy source code contains a third regex pattern, shown below:
//# Appears at offset 0x78 in Frame 3 //# or offset 0x4E within UDP payload const SmbRegEx_3a = /\x43\x41\x42\x00.\x53\x4D\x42/; const SmbRegEx_3b = /\x43\x41\x42\x00\xFF\x53\x4D\x42/;
Interestingly, this pattern fails to match for both _3a and _3b. I would expect _3a to match because it contains the wildcard. Not sure what is going wrong with this pattern. Is there a certain depth/limit at which the match() method will stop searching?
Cheers! Mark
regex_test.bro.txt regex_test.evt.txt regex_test.spicy.txt smb-browser-elections.pcap.txt