Closed AdamTheAnalyst closed 7 years ago
That's really strange; I would have expected it to work. @chisholm would be better than me at understanding the verbose output and tracking down what's going on.
Try replacing "i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net"
in your dict with u"i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net"
. I.e. make it a unicode string. The pattern matcher treats plain str
values as binary, in python2. So the pattern matcher thinks you're attempting a regex match (or like) on binary data. Binary regex matching is in spec, but isn't implemented yet. I hope to have it in soon. LIKE
on binary data isn't in spec at all.
The reason it works when invoked from the commandline is because the content/pattern files are opened in text mode with a declared encoding, so the data comes out as text.
There is another open issue which has the same confusion. This confusion starts to seem like a recurring problem. But I think the real root of it is python2's conflation of text and binary data. I think they really do need to be treated as different types. That language design flaw is addressed in python3; I wish everyone could switch to that!
In either case, in Python when you json.load
JSON from a file, you get text (unicode in python2). When ANTLR parses a pattern, the content of tokens is also text (unicode in python2). So down in the libraries, they're doing it correctly. We just need to get people who are still on python2 to also do it correctly! str
in python2 is binary and unicode
is text!
That works! Thanks, I haven't yet made the shift to python 3, more out of stubbornness and legacy code than anything, so looks like I will take the plunge some time soon.
For anyone else picking also hitting this issue, the following code now works:
from stix2matcher import matcher
import json
obs = json.loads("""{
"created": "2017-08-29T08:26:22.135105Z",
"first_observed": "2017-08-29T08:25:44Z",
"id": "observed-data--43d8af75-216d-414a-a156-50eeba54e4c5",
"last_observed": "2017-08-29T08:25:44Z",
"modified": "2017-08-29T08:26:22.135105Z",
"number_observed": 1,
"objects": {
"0": {
"type": "domain-name",
"value": "i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net"
}
},
"type": "observed-data",
"@version": "1",
"@timestamp": "2017-08-29T08:26:22.706Z"
}""")
# Works
print matcher.match("[domain-name:value LIKE '%cedexis%']",[obs,])
print matcher.match("[domain-name:value MATCHES '.*cedexis.*']",[obs,])
print matcher.match("[domain-name:value MATCHES '^.*cedexis.*%']",[obs,])
print matcher.match("[domain-name:value = 'i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net']",[obs,])
Thanks to the both of you for your help! Will be sure to buy you a beer if we ever run into each other at a conference, this is about the third issue you guys have helped me with in the STIX2 repo's.
Adam
Happy to help, @AdamTheAnalyst (though @chisholm did all the real work).
I'm going to close this, since I think it's resolved, but if not, feel free to re-open it.
Hi There,
I can't seem to get wildcards working as per the 2.0 spec when I import the library into my code (both LIKE and MATCHES not working for me):
Example Code:
Verbose Output:
These patterns match when called from the command line, so I imagine its something to do with how i'm importing the library.
Patterns.txt:
obs.json:
Result:
Verbose:
What is the best approach to achieve this?
Cheers
Adam