oasis-open / cti-pattern-matcher

OASIS TC Open Repository: Match STIX content against STIX patterns
https://github.com/oasis-open/cti-pattern-matcher
BSD 3-Clause "New" or "Revised" License
44 stars 20 forks source link

Accept single Observed Data SDO in input file. #50

Closed wutdequack closed 6 years ago

wutdequack commented 6 years ago

Hi,

Just a curious user of this library. I used a file with this Observable SDO (this is an excerpt of the example given in matcher.py): { "type": "observed-data", "id": "observed-data--b67d30ff-02ac-498a-92f9-32f845f448cf", "created": "2016-04-06T19:58:16.000Z", "modified": "2016-04-06T19:58:16.000Z", "first_observed": "2015-12-21T19:00:00Z", "last_observed": "2015-12-21T19:00:00Z" , "number_observed": 2, "objects": { "0": { "type": "file", "hashes": { "SHA-256": "d774bda24d8c30be97ecb6d3d8133ed83d6ddfa202f1e466b8e62da7684480f3" } }, "1": { "type": "file", "mime_type": "application/zip", "hashes": { "MD5": "22A0FB8F3879FB569F8A3FF65850A82E" } } } }

And this is the pattern that I used: [file:mime_type = 'application/zip' AND file:hashes.'MD5' = '22A0FB8F3879FB569F8A3FF65850A82E']

Running this through the stix2-matcher.exe gave this result:

Traceback (most recent call last):
  File "build\bdist.win32\egg\stix2matcher\matcher.py", line 2195, in main
  File "build\bdist.win32\egg\stix2matcher\matcher.py", line 2154, in match
  File "build\bdist.win32\egg\stix2matcher\matcher.py", line 2124, in match
  File "build\bdist.win32\egg\stix2matcher\matcher.py", line 941, in __init__
TypeError: string indices must be integers

Tracing down the error to the source code, it was due to this line: number_observed = sdo["number_observed"]

It is possible that it couldn't recognize my "number_observed" as an integer, which is very weird to me. It is likely an error on my end but I am not too sure how to mitigate this error from appearing. I have validated both the pattern and the SDO with the various libraries provided by OASIS.

It will be awesome if I could get a hint of what is going on so I can further investigate on my own. I am still exploring STIX and I might be wrong that this may be a bug, but I do want to learn what I might have misunderstood.

gtback commented 6 years ago

Hi, @wutdequack thanks for the questions. I'm not running on Windows, but this works for me:

$ cat gh-50.patterns
[file:mime_type = 'application/zip' AND file:hashes.'MD5' = '22A0FB8F3879FB569F8A3FF65850A82E']
$ cat gh-50.json
[
    {
        "type": "observed-data",
        "id": "observed-data--b67d30ff-02ac-498a-92f9-32f845f448cf",
        "created": "2016-04-06T19:58:16.000Z",
        "modified": "2016-04-06T19:58:16.000Z",
        "first_observed": "2015-12-21T19:00:00Z",
        "last_observed": "2015-12-21T19:00:00Z" ,
        "number_observed": 2,
        "objects": {
            "0": {
                "type": "file",
                "hashes": {
                    "SHA-256": "d774bda24d8c30be97ecb6d3d8133ed83d6ddfa202f1e466b8e62da7684480f3"
                }
            },
            "1": {
                "type": "file",
                "mime_type": "application/zip",
                "hashes": { "MD5": "22A0FB8F3879FB569F8A3FF65850A82E" }
            }
        }
    }
]
$ stix2-matcher -p gh-50.patterns -f gh-50.json

MATCH:  [file:mime_type = 'application/zip' AND file:hashes.'MD5' = '22A0FB8F3879FB569F8A3FF65850A82E']

You might be able to get some more information by running stix2-matcher.exe with the -v option.

If it's still not working, can you share the output of pip freeze, and more information about the version of Python and operating system you are using?

gtback commented 6 years ago

Ahhh, may have just figured it out. The -f parameter expects a list of ObservedData SDOs. When I ran the same script without the outer [] in gh-50.json, I got the same error as you.

I agree this could be improved!

wutdequack commented 6 years ago

Ahhh I see, now I understand its intended purpose. Thanks for the clarification @gtback. But just wanted to check, what is the purpose behind designing gh-50.json to be a list of observed data instead of just an individual SDO itself?

Based on my understanding, the objects in the observed data SDO can be added to the "objects" field without the need of creating a new observed-data object to instantiate. My understanding in this area might not be aligned to yours, so I would like to be re-aligned to the common understanding in this area (observed-data SDO). Thanks so much for the help!

gtback commented 6 years ago

Some patterns require multiple ObservedData instances (for examples, those with REPEATS or FOLLOWED BY, or any time there are multiple "Observation Expressions" in the pattern), so the match function takes a list of SDOs and returns a single match (either one SDO or set of SDOs if a match requires multiple); if there are multiple single SDOs that match a pattern with one Observation Expression, only one is returned.

I think it would make sense for the matcher to accept an input file with a single Observed Data instance, and transform it into a list before passing it to match(). I'll rename this issue to address that.

wutdequack commented 6 years ago

Ahhh... I see. Thanks for the clarification @gtback! Looking forward to see the variety of options for the cti-pattern-matcher tool!