oasis-open / cti-python-stix2

OASIS TC Open Repository: Python APIs for STIX 2
https://stix2.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
356 stars 113 forks source link

Work with Stix2 and ATT&CK #543

Open dynamic-modeller opened 2 years ago

dynamic-modeller commented 2 years ago

Hi,

I am enjoying working with your library, thanks for writing it. However i want to clariy how to work with ATT&CK.

Now one great feature is, if i import stix 2.1 data, (using python's json.load()) i can ensure that i only`have stix-compliant data, by parsing the bundle and each object without any flags set. In this case, any non-standard properties, or objects cause the parsing operation to fail. If I want to allow custom properties, then if i set the flag, enable_custom=True, then it:

  1. enables parsing of packets with custom objects and proeprties
  2. creates those custom classes as part of the parsing operation

But what if I want to load and work with ATT&CK? At the moment there is no flag to enable ATT&CK-only objects and properties, and one assumes that i would need to use the enable_Custom=True flag., thereby ATT&CK data cannot be easily checked for consistency/compatibility with the standard.

However, when one checks the ATT&CK Stix2 documentation, this flag is never used (https://github.com/mitre-attack/attack-stix-data/blob/master/USAGE.md#accessing-attck-data-in-python). Instead they use the Memory Store and the load_from_file method

from stix2 import MemoryStore

src = MemoryStore()
src.load_from_file("enterprise-attack/enterprise-attack.json")

All of the code examples they then use, access all of the variables and custom objects without a problem.

How does this work? I thought the intent of the library was to ensure stix standard data only, but this could be extended using enable custom flags? Yet this example shows i do not need to use those flags to import ATT&CK data which is mightily confusing.

Can you advise the best practice way to load in ATT&CK data, make sure it is correct, and then query it? Is it only the parsing library that requires the enable_custom=True flag,? If i implement a data source/sink, with a load_from_file method, should it be able to import either Stix or ATT&CK data? What about other custom variants? How should i best setup parsing and a typedb data source/sink to handle both Stix 2.1 and ATT&CK?

Can you advise please, thanks

clenk commented 2 years ago

Hi @dynamic-modeller, understand the confusion. The datastores that come built into python-stix2 (MemoryStore, FileSystemStore, and TAXIICollectionStore) set allow_custom at the datastore level rather than requiring it be passed in each function call, and default allow_custom to True.

I would recommend setting allow_custom=True by default on any new datastores you write.

If you look at MemorySource as an example, it stores allow_custom when it gets initialized, load_from_file() calls another function to add objects to itself and passes in its allow_custom value, which then passes that allow_custom to parse().

If the MemoryStore was instead created with

src = MemoryStore(allow_custom=False)

then you'd have to pass in allow_custom=True to every function call.

chisholm commented 2 years ago

Hi,

I am enjoying working with your library, thanks for writing it. However i want to clariy how to work with ATT&CK.

Now one great feature is, if i import stix 2.1 data, (using python's json.load()) i can ensure that i only`have stix-compliant data, by parsing the bundle and each object without any flags set. In this case, any non-standard properties, or objects cause the parsing operation to fail. If I want to allow custom properties, then if i set the flag, enable_custom=True, then it:

1. enables parsing of packets with custom objects and proeprties

2. creates those custom classes as part of the parsing operation

The second item here is not true. Registration of classes for custom STIX types must be done manually. If allow_custom=True and you parse content corresponding to an unregistered STIX type (e.g. dict, string, file-like object), you will get a plain dict back.

But what if I want to load and work with ATT&CK? At the moment there is no flag to enable ATT&CK-only objects and properties, and one assumes that i would need to use the enable_Custom=True flag., thereby ATT&CK data cannot be easily checked for consistency/compatibility with the standard.

However, when one checks the ATT&CK Stix2 documentation, this flag is never used (https://github.com/mitre-attack/attack-stix-data/blob/master/USAGE.md#accessing-attck-data-in-python). Instead they use the Memory Store and the load_from_file method

In addition to what Chris explained, note that the default value for allow_custom in a MemoryStore is True, so if you don't explicitly pass a value, custom content is allowed automatically. You could set it to false if you wanted to.

Can you advise the best practice way to load in ATT&CK data, make sure it is correct, and then query it? Is it only the parsing library that requires the enable_custom=True flag,? If i implement a data source/sink, with a load_from_file method, should it be able to import either Stix or ATT&CK data? What about other custom variants? How should i best setup parsing and a typedb data source/sink to handle both Stix 2.1 and ATT&CK?

Can you advise please, thanks

The characterization "either Stix or ATT&CK data" seems a little imprecise. Their repo is intended to house STIX content. There is no variant of the STIX spec for ATT&CK data. Even "custom" content has a specific meaning and rules one must follow, which is laid out in section 11 of the STIX spec. I think it would be more correct to say "ATT&CK data which is STIX data", at least as ATT&CK content is represented in the attack-stix-data repo.

When one uses section-11-style custom content, one could still decide that the custom types and properties should be used in particular ways, but those conventions would lie outside the STIX spec. The only documented way the library has to check those kinds of conventions is via registered classes for custom STIX types. A custom class can define properties in the same way as the classes builtin to the library, e.g. defining which are required or optional, what types of values they should have, etc. One could certainly imagine defining and registering a set of classes corresponding to their custom types, e.g. x-mitre-tactic, but it looks like they have not done that (I am not too familiar with their tooling). Note that this would not address custom properties on spec-defined STIX types. With section-11-style custom properties on spec-defined types builtin to the library, the library does not provide a way to automatically validate them.

The "section-11-style" qualifier in the above discussion is important, because that style of custom content is deprecated in STIX 2.1. It was not removed from the spec though, so the library continues to support it. The new mechanism for customization in STIX 2.1 is the extension, which is described in spec section 7.3. The library supports that style too, and that would be a way to have improved validation of custom content, even for spec defined STIX types builtin to the library.

All of this is to say that there are various mechanisms in the library and STIX 2.1 spec for validating customizations, but it seems like their content doesn't use the latest extension mechanism, and their Python recipes don't seem to include more advanced techniques (e.g. registering custom classes). So what you have is a pretty simple and lax system: pretty much any STIX type goes and any custom property goes, as long as you use allow_custom=True.

If you want some improved validation, you could try creating and registering custom classes for custom ATT&CK STIX types yourself. I think that might get you part of the way there, but as mentioned above, you would still not get validation of custom properties on builtin classes. But that might be the best you can do for now.

dynamic-modeller commented 2 years ago

Hi Chris and Chisholm,

Thanks very much for your excellent answers. It is apparent that the extension/customisation model for Stix 2.1 is based on the concept of ad-hoc extensions (e.g. company-specific or device-specific extensions) rather than dialect extensions (e.g. a standardised, versioned extension).

The Section 7.3-style extensions method where one has to carry the definitions everywhere with the data, would seem a bit clumsy for a dialect that is standardised, like ATT&CK.

I really appreciate your explanation on enable_custom=True, and the ability to validate the stix2.1 standard objects. I really like your ability to do this, and i wish there was an official ATT&CK extension that also supported this capability for their dialect. At the upcoming ATT&CK-con I might see if i can convince someone to build an extension for your stix2 python library.

Although i still think the standard needs some modification to support this standardised dialect approach. It would be nice if i could validate ATT&CK data through the stix2 library, as well as the stix2 data.

With regsard to the TypeDB data store/sink, unfortunately i will not be allowing the enable_custom=True flag initially. this is because TypeDB is strongly typed, and thereby the schema is fixed and must be defined before loading the data. Initially, my plan is to set it up so i can take in either Stix 2.1, or the ATT&CK dialect of Stix 2.1. This will be setup using a flag in the datastore.init, where type is either "stix"or "mitre".

Based on this flag i will initiate the datastore with either the standard stix schema, or the ATT&CK extended schema. Then one can use the standard stix2 commands and load either type of dataset, ignoring all other data. Its possible in the future to create a system that dynamically updates the schema, but i may put this off for a later revision. I also ignore the extensions field for the moment as well.

On datastore init, I presently load the four TLP markings, so they can always be matched when needed. I will probably also setup a datastore.connect method, to attach to an existing datastore.

Anyway, I am making good progress and I look forward to demonstrating my library add-on soon.

Thanks guys

rpiazza commented 2 years ago

@dynamic-modeller,

A few comments...

First, we don't envision sending the extension-definition object when sharing content. The extension-definitions would reside in a common repository - probably https://github.com/oasis-open/cti-stix-common-objects

The ATT&CK team may not be ready to define their content using an extension-definition, which should contain a json-schema - but hopefully they are including that in their future planning. I will try to get some information on that.

chisholm commented 2 years ago

Echoing Rich, we actually had issue #535 not too long ago, which discussed this exact topic. You probably wouldn't want to include the extension-definition SMO alongside every usage.

dynamic-modeller commented 2 years ago

Hey Rich and Chisholm,

Great feedback on the extension definition, thank you. I think initially, I am going to get out a standard stix version, and one that supports the current Mitre ATT&CK definition (becuase i have users that want this).

I probably will not support extensions (through the extension field) initially, as there are some consequences that are unclear and i dont have any potential users for this as yet.

However, based on your feedback, I can see this capability being brought in later on, where we pull the definitions from the github, and then dynamically adjust the schema before loading the data.

Anyway, your excellent feedback has enabled me to set the course for the initial release, and i look forward to demonstrating this to you guys soon-ish.

Thanks a lot!!!

rpiazza commented 2 years ago

I spoke to the ATT&CK team. They do not have converting the STIX ATT&CK content to use extensions on their planning agenda.

dynamic-modeller commented 2 years ago

Hi Rich and Chisholm,

Thats sad on the stix/mitre front. Now that i have spent all of this time on it, that stix standard is increasingly looking like a clever way to store things. Of course it would have been much more cooler/useful if it was based on a hypergraph, rather than a property graph model, but even so it still has some pretty nice features. I hope the MITRE group ends up translating everything into Stix (e.g. D3FEND etc.). I plan to push Stix as far as we can, there's some nice work in there, well done.

I am modifying my previous statements on supporting extensions, to say that i will support both pre-defined extensions (e.g. SCO Observable extensions), and externally defined extensions (e.g. Mitre ATT&CK), but no on-the-fly extensions. I have harvested every json snippet on the Stix 2.1 standards html page, and am using these as test cases, so for my first release the immediate aim is have the capability to import every part of the Stix standard document and Mitre ATT&CK Stix standard, and leave dynamic extensions and other pieces till later.

Outside of Stix relationships (SRO) I have neatened up your other relations, and your SCO extensions model by making some simple assumptions:

  1. All relations outside of Stix relationships are embedded, so they are directed single edges pointing from self to some external object. A large majority of these are built-in, and point to an id field (e.g. created_by_ref etc.) which points to another Stix object.
  2. Stix is full of sub-objects, and sub-objects with sub-objects, and this intricacy can be enormously simplified by formalising classes for the sub-objects, which are then connected by their own embedded relation back to the parent. This seemingly subtle distinction creates a massive simplification in cases for the software parsing the json data for import
  3. Extension entities can be the parent or child of another extension, and in that case, another embedded relation lies between them

In short I have created:

So my import code is massively simp;lified, like

elif (rel == "object_refs"
          or rel == "created_by_ref"
          or rel == "object_marking_refs"
          or rel == "sample_refs"
          or rel == "host_vm_ref"
          or rel == "operating_system_ref"
          or rel == "installed_software_refs"
          or rel == "analysis_sco_refs"
          or rel == "sample_ref"
          or rel == "contains_refs"
          or rel == "resolves_to_refs"
          or rel == "belongs_to_ref"
          or rel == "belongs_to_refs"
          or rel == "raw_email_ref"
          or rel == "from_ref"
          or rel == "sender_ref"
          or rel == "to_refs"
          or rel == "cc_refs"
          or rel == "bcc_refs"
          or rel == "body_raw_ref"
          or rel == "raw_email_ref"
          or rel == "contains_refs"
          or rel == "content_ref"
          or rel == "parent_directory_ref"
          or rel == "src_ref"
          or rel == "dst_ref"
          or rel == "src_payload_ref"
          or rel == "dst_payload_ref"
          or rel == "encapsulates_refs"
          or rel == "encapsulated_by_ref"
          or rel == "message_body_data_ref"
          or rel == "opened_connection_refs"
          or rel == "creator_user_ref"
          or rel == "image_ref"
          or rel == "parent_ref"
          or rel == "child_refs"
          or rel == "service_dll_refs"): 
        match, insert = embedded_relation( rel, obj[rel], obj_var)

the reason the above list contains both plural (list) and single variants is that TypeDB is polymorphic, and so data attributes, owned by another entity, relation or attribute, are considered to contain 1 or more values (unordered list).

In terms of the "match" and "insert" returned by the function, this is the series of "match-statements"made to fetch and apply vairiable names to data in existing records, which are then mixed with the new facts and relations in the "insert-statements", like matching in an identity SDO, to combine with the insert of some new fact created by that author in the relation that joins them.

Arguably, under this rationale, then the email-mime-part should also be a SCO extension, as its attachment behaviour is the same as the general extension case. Anyway, interesting stuff