oasis-open / cti-pattern-matcher

OASIS TC Open Repository: Match STIX content against STIX patterns
https://github.com/oasis-open/cti-pattern-matcher
BSD 3-Clause "New" or "Revised" License
44 stars 20 forks source link

Issue With Wildcards / Importing Library #43

Closed AdamTheAnalyst closed 7 years ago

AdamTheAnalyst commented 7 years ago

Hi There,

I can't seem to get wildcards working as per the 2.0 spec when I import the library into my code (both LIKE and MATCHES not working for me):

Example Code:

from stix2matcher import matcher

obs = {  
   "created":"2017-08-29T08:26:22.135105Z",
   "first_observed":"2017-08-29T08:25:44Z",
   "id":"observed-data--43d8af75-216d-414a-a156-50eeba54e4c5",
   "last_observed":"2017-08-29T08:25:44Z",
   "modified":"2017-08-29T08:26:22.135105Z",
   "number_observed":1,
   "objects":{  
      "0":{  
         "type":"domain-name",
         "value":"i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net"
      }
   },
   "type":"observed-data",
   "@version":"1",
   "@timestamp":"2017-08-29T08:26:22.706Z"
}

# Dont Work
print matcher.match("[domain-name:value LIKE '%%cedexis%%']",[obs,])
print matcher.match("[domain-name:value LIKE '%cedexis%']",[obs,])
print matcher.match("[domain-name:value MATCHES '.*cedexis.*']",[obs,])
print matcher.match("[domain-name:value MATCHES '^.*cedexis.*$']",[obs,])

# Works
print matcher.match("[domain-name:value = 'i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net']",[obs,])

Verbose Output:

exitObjectType (domain-name): push {0: {'0': [{'type': 'domain-name',
            'value': 'i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net'}]}}
exitFirstPathComponent (value): pop {0: {'0': [{'type': 'domain-name',
            'value': 'i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net'}]}}
exitFirstPathComponent (value): push {0: {'0': ['i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net']}}
exitPropTestLike (%%cedexis%%): pop {0: {'0': ['i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net']}}
exitPropTestLike (%%cedexis%%): push {}
exitObservationExpression (simple): pop {}
exitObservationExpression (simple): push []
[]
exitObjectType (domain-name): push {0: {'0': [{'type': 'domain-name',
            'value': 'i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net'}]}}
exitFirstPathComponent (value): pop {0: {'0': [{'type': 'domain-name',
            'value': 'i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net'}]}}
exitFirstPathComponent (value): push {0: {'0': ['i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net']}}
exitPropTestLike (%cedexis%): pop {0: {'0': ['i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net']}}
exitPropTestLike (%cedexis%): push {}
exitObservationExpression (simple): pop {}
exitObservationExpression (simple): push []
[]
exitObjectType (domain-name): push {0: {'0': [{'type': 'domain-name',
            'value': 'i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net'}]}}
exitFirstPathComponent (value): pop {0: {'0': [{'type': 'domain-name',
            'value': 'i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net'}]}}
exitFirstPathComponent (value): push {0: {'0': ['i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net']}}
exitPropTestRegex ('.*cedexis.*'): pop {0: {'0': ['i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net']}}
exitPropTestRegex ('.*cedexis.*'): push {}
exitObservationExpression (simple): pop {}
exitObservationExpression (simple): push []
[]
exitObjectType (domain-name): push {0: {'0': [{'type': 'domain-name',
            'value': 'i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net'}]}}
exitFirstPathComponent (value): pop {0: {'0': [{'type': 'domain-name',
            'value': 'i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net'}]}}
exitFirstPathComponent (value): push {0: {'0': ['i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net']}}
exitPropTestRegex ('^.*cedexis.*$'): pop {0: {'0': ['i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net']}}
exitPropTestRegex ('^.*cedexis.*$'): push {}
exitObservationExpression (simple): pop {}
exitObservationExpression (simple): push []

These patterns match when called from the command line, so I imagine its something to do with how i'm importing the library.

Patterns.txt:

[domain-name:value LIKE '%cedexis%']
[domain-name:value MATCHES '.*cedexis.*']
[domain-name:value MATCHES '^.*cedexis.*%']

obs.json:

[{  
   "created":"2017-08-29T08:26:22.135105Z",
   "first_observed":"2017-08-29T08:25:44Z",
   "id":"observed-data--43d8af75-216d-414a-a156-50eeba54e4c5",
   "last_observed":"2017-08-29T08:25:44Z",
   "modified":"2017-08-29T08:26:22.135105Z",
   "number_observed":1,
   "objects":{  
      "0":{  
         "type":"domain-name",
         "value":"i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net"
      }
   },
   "type":"observed-data",
   "@version":"1",
   "@timestamp":"2017-08-29T08:26:22.706Z"
}]

Result:

$> stix2-matcher -p patterns.txt -f obs.json
MATCH:  [domain-name:value LIKE '%cedexis%']
MATCH:  [domain-name:value MATCHES '.*cedexis.*']
MATCH:  [domain-name:value MATCHES '^.*cedexis.*$']

Verbose:

exitObjectType (domain-name): push {0: {u'0': [{u'type': u'domain-name',
             u'value': u'i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net'}]}}
exitFirstPathComponent (value): pop {0: {u'0': [{u'type': u'domain-name',
             u'value': u'i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net'}]}}
exitFirstPathComponent (value): push {0: {u'0': [u'i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net']}}
exitPropTestLike (%cedexis%): pop {0: {u'0': [u'i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net']}}
exitPropTestLike (%cedexis%): push {0: set([u'0'])}
exitObservationExpression (simple): pop {0: set([u'0'])}
exitObservationExpression (simple): push [(0,)]

MATCH:  [domain-name:value LIKE '%cedexis%']
exitObjectType (domain-name): push {0: {u'0': [{u'type': u'domain-name',
             u'value': u'i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net'}]}}
exitFirstPathComponent (value): pop {0: {u'0': [{u'type': u'domain-name',
             u'value': u'i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net'}]}}
exitFirstPathComponent (value): push {0: {u'0': [u'i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net']}}
exitPropTestRegex ('.*cedexis.*'): pop {0: {u'0': [u'i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net']}}
exitPropTestRegex ('.*cedexis.*'): push {0: set([u'0'])}
exitObservationExpression (simple): pop {0: set([u'0'])}
exitObservationExpression (simple): push [(0,)]

MATCH:  [domain-name:value MATCHES '.*cedexis.*']
exitObjectType (domain-name): push {0: {u'0': [{u'type': u'domain-name',
             u'value': u'i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net'}]}}
exitFirstPathComponent (value): pop {0: {u'0': [{u'type': u'domain-name',
             u'value': u'i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net'}]}}
exitFirstPathComponent (value): push {0: {u'0': [u'i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net']}}
exitPropTestRegex ('^.*cedexis.*$'): pop {0: {u'0': [u'i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net']}}
exitPropTestRegex ('^.*cedexis.*$'): push {0: set([u'0'])}
exitObservationExpression (simple): pop {0: set([u'0'])}
exitObservationExpression (simple): push [(0,)]

MATCH:  [domain-name:value MATCHES '^.*cedexis.*$']

What is the best approach to achieve this?

Cheers

Adam

gtback commented 7 years ago

That's really strange; I would have expected it to work. @chisholm would be better than me at understanding the verbose output and tracking down what's going on.

chisholm commented 7 years ago

Try replacing "i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net" in your dict with u"i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net". I.e. make it a unicode string. The pattern matcher treats plain str values as binary, in python2. So the pattern matcher thinks you're attempting a regex match (or like) on binary data. Binary regex matching is in spec, but isn't implemented yet. I hope to have it in soon. LIKE on binary data isn't in spec at all.

The reason it works when invoked from the commandline is because the content/pattern files are opened in text mode with a declared encoding, so the data comes out as text.

There is another open issue which has the same confusion. This confusion starts to seem like a recurring problem. But I think the real root of it is python2's conflation of text and binary data. I think they really do need to be treated as different types. That language design flaw is addressed in python3; I wish everyone could switch to that!

In either case, in Python when you json.load JSON from a file, you get text (unicode in python2). When ANTLR parses a pattern, the content of tokens is also text (unicode in python2). So down in the libraries, they're doing it correctly. We just need to get people who are still on python2 to also do it correctly! str in python2 is binary and unicode is text!

AdamTheAnalyst commented 7 years ago

That works! Thanks, I haven't yet made the shift to python 3, more out of stubbornness and legacy code than anything, so looks like I will take the plunge some time soon.

For anyone else picking also hitting this issue, the following code now works:

from stix2matcher import matcher
import json
obs = json.loads("""{
    "created": "2017-08-29T08:26:22.135105Z",
    "first_observed": "2017-08-29T08:25:44Z",
    "id": "observed-data--43d8af75-216d-414a-a156-50eeba54e4c5",
    "last_observed": "2017-08-29T08:25:44Z",
    "modified": "2017-08-29T08:26:22.135105Z",
    "number_observed": 1,
    "objects": {
        "0": {
            "type": "domain-name",
            "value": "i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net"
        }
    },
    "type": "observed-data",
    "@version": "1",
    "@timestamp": "2017-08-29T08:26:22.706Z"
}""")

# Works
print matcher.match("[domain-name:value LIKE '%cedexis%']",[obs,])
print matcher.match("[domain-name:value MATCHES '.*cedexis.*']",[obs,])
print matcher.match("[domain-name:value MATCHES '^.*cedexis.*%']",[obs,])
print matcher.match("[domain-name:value = 'i2-ymeamdnxvpofijvkluyxccakvwpnua.init.cedexis-radar.net']",[obs,])

Thanks to the both of you for your help! Will be sure to buy you a beer if we ever run into each other at a conference, this is about the third issue you guys have helped me with in the STIX2 repo's.

Adam

gtback commented 7 years ago

Happy to help, @AdamTheAnalyst (though @chisholm did all the real work).

I'm going to close this, since I think it's resolved, but if not, feel free to re-open it.