Open dynamic-modeller opened 2 years ago
When considering the example threat reports included on your examples page, the parser can successfully import poisonivy.json, but fails on APT1.json and IMDDOS.json. All files were retrieved from the links on your website. These threat reports should be able to be parsed, but fail because of a problem with the codec
APT1 and Poison Ivy work for me:
>>> with open("apt1.json", encoding="utf-8") as fp:
... bundle = stix2.parse(fp)
...
>>> len(bundle.objects)
76
>>> with open("poisonivy.json", encoding="utf-8") as fp:
... bundle = stix2.parse(fp)
...
>>> len(bundle.objects)
155
IMDDOS may have a problem: it uses a STIX 2.0 bundle which declares its spec version as "2.0", but includes a location
object, which was new in STIX 2.1. Or, maybe they consider it custom with respect to STIX 2.0? Not sure about that. You can get it to parse if you allow custom content:
>>> with open("imddos.json", encoding="utf-8") as fp:
... bundle = stix2.parse(fp, allow_custom=True)
...
>>> len(bundle.objects)
15
The location object will come out as a plain dict.
When considering the Mitre Attack example, it is acknowledged that there are additional properties included,however the content is still in Stix format. None of the 3 bundles at the Mitre github can be parsed, and all fail with an error [ Unexpected properties for Bundle: (spec_version)..]
These are kind of oddball STIX 2.0 bundles: they actually declare their spec_version as "2.1". I am not sure that's non-compliant, since the STIX 2.0 spec says "The value of this property MUST be 2.0 for bundles containing STIX Objects defined in this specification", which leaves open the possibility that you could use other values for objects defined in other specs, e.g. "2.1" for objects from the STIX 2.1 spec. But regardless, the stix2 library does not support objects from other STIX versions in a STIX 2.0 bundle (those bundles all seem to have STIX 2.1 objects). That's a limitation of the library.
The attack-stix-data repo bills itself as containing STIX 2.1 content. So I think the real question here, is why are they using STIX 2.0 bundles?
If I remove the "spec_version" property from one of those bundles, which effectively converts it to a STIX 2.1 bundle, then it parses:
>>> with open("ics-attack.json", encoding="utf-8") as fp:
... bundle = stix2.parse(fp, allow_custom=True)
...
>>> len(bundle.objects)
830
the behaviour one expects is: a. all valid stix files can be imported, particularly the demos from your site b. stix files with extensions should still be imported, with either the additional fields in place, or removed, depending on some setting
Well, it seems not all of that content is valid. :\ Or at least, may not be as its creator intended it to be.
I believe unregistered STIX 2.1 extensions should be accepted (if that's what you mean). If you find an error regarding extensions, please create an issue and include some details.
if the parser was more robust, it could easily form the core of a really useful stix importer, and the existing stix objects could then be expanded to report their own typeql import statements (e.g. develop a to_typedb function for every object).
However the parser is currently not robust to even the slightest variations,a nd thereby we might have to write our own parser, which seems a waste. Can you advise please?
My impression from looking at this, is that these are pretty much all content issues.
I did find a potential small issue with STIX spec version detection with bundles: presence of spec_version
in the bundle should indicate version "2.0", regardless of its value (in STIX 2.0, the value indicates the spec version of the content of the bundle, not the bundle itself). I'll create a separate issue.
Hey Chisholm,
Thanks for your speedy reply. Your were so right, and my opinion of the parser has switched 180 degrees. Its exactly what i was hoping for, I just needed two small changes.
Firstly, i had to set the encoding in the file open, as some of the files came from non windows source. Trap for young players, indeed. Instead of
for file in file_paths: with open(file) as f: json_text = json.load(f)
I have changed it to specifiy encoding, which fixes up heaps of problems
for file in file_paths: with open(file, encoding='utf-8') as f: json_text = json.load(f)
Further, I have found out more about the wonderful parsing flag to allow custom, like
local_obj = parse(obj, allow_custom=True)
This then made the parser amazingly robust. It even handled the MITRE ATT&ACK stuff. On reading the docs its apparent that it even makes new properties for you, and one presumes even new classes. So, now i think your parser is amang, and of huge benefit for the typedb open source stix integration we are planning. Great Stuff.
thanks for that, more on my typedb integration into the default stix database soon
Hi,
I have been playing with your library and found that the parser is not robust, and seems to break easily in two sitatuions:
In both situations, the parsing function breaks but for two different reasons.
When considering the example threat reports included on your examples page, the parser can successfully import poisonivy.json, but fails on APT1.json and IMDDOS.json. All files were retrieved from the links on your website. These threat reports should be able to be parsed, but fail because of a problem with the codec
When considering the Mitre Attack example, it is acknowledged that there are additional properties included,however the content is still in Stix format. None of the 3 bundles at the Mitre github can be parsed, and all fail with an error [ Unexpected properties for Bundle: (spec_version)..]
the behaviour one expects is: a. all valid stix files can be imported, particularly the demos from your site b. stix files with extensions should still be imported, with either the additional fields in place, or removed, depending on some setting
if the parser was more robust, it could easily form the core of a really useful stix importer, and the existing stix objects could then be expanded to report their own typeql import statements (e.g. develop a to_typedb function for every object).
However the parser is currently not robust to even the slightest variations,a nd thereby we might have to write our own parser, which seems a waste. Can you advise please?
Thanks