oasis-tcs / cti-stix2

OASIS CTI TC: Provides issue tracking and wiki pages for the STIX 2.x Work Products
https://github.com/oasis-tcs/cti-stix2
Other
23 stars 9 forks source link

How to add extension to SCO (sanely)? #273

Open jmgnc opened 3 years ago

jmgnc commented 3 years ago

Trying to convert some of the SEPs over to and it looks like the only way to do the conversion is to make SCO "extensions" use the full UUID, which makes understanding WHAT extensions your accessing in a indicator pattern very difficult, unless you're good at memorizing ALL the UUID's.

There is no way that I can see to use the extension definition in 7.3 and readable names. Which also means that they lose the ability to provide JSON schema, etc. That is because the extension enumeration type is limited, and the only one that applies is property-extension, but that requires the UUID to be fully spelled out.

rpiazza commented 3 years ago

@jmgnc - Yes I noticed this earlier. It is undesirable, might not be as bad as you think. The assumption would be that the producer of the pattern would use the correct UUID, as would the producer of the content. They wouldn't pick are arbitrary UUID :-). The question is, how to find what the correct extension-definition (and UUID) is. I have made a suggestion to store extension-definitions (and their related jsonschemas, etc) in the common STIX object repo (see https://github.com/oasis-open/cti-stix-common-objects).

I looked at the extension-definition spec, and noticed the definition of the name property (which is required) is:

A name used for display purposes during execution, development, or debugging.

Maybe in the future, this could be the name could be used in the object path in the patterns. I'm not sure if implementations could currently make use of this idea given the current spec.

maybe-sybr commented 3 years ago

I anticipated that the use of a UUID key in the extensions mapping was required to enforce differentiation of versions of well-defined extensions. e.g. for some hypothetical extension-definition with name = "foo-data" and an identifier = "extension-definition--<some_uuid>" [fn0], if a new version is defined with some change (minor or major structural), it would be important to start using a new, unambiguously identifiable key in the extensions mapping. Since there are versioned schemas involved, and assuming that there isn't a well-defined way to do semver-like discovery of schema versions, differentiating the extension definitions by UUID is probably the best mechanism from a data-model perspective.

This has the potential to blow out the size of JSON schemas for validation from the top level but since you'd have to define schemas for the extensions themselves as part of the extension-definition, I would imagine using schema references should make authoring top level schemas relatively DRY. I say this without having authored any myself (yet) though so I'm not sure about how easy it will be in reality. Specifically, I would expect that a schema for the extensions key would need to be aware of all of the known extension-definitions and their allocated UUIDs and use a $ref to that definition's own schema. Perhaps validation would also need to be more dynamic though in order to make it easier for downstream STIX consumers to develop their own top-level schemas (or modify/plug into canonical ones) to add their own extension-definition--<uuid> keys with schema $refs to the top level schema?

What I see as the real pain in the butt is going to be the using the data model as a human/code author, rather than a schema author. Having to do stuff like create accessors for data stored in a extension-definition key block with some UUID which varies based on the version of the extension-definition-- you have used for some document in your corpus sounds like it'll be kind of disgusting. However, I do see that as something that the STIX2 library should be capable of doing for you. ie. It should be possible to "register" a bunch of extension-definitions with the library which would then be capable of implementing human-friendly accessors for data stored in extension-definition keys for non-top-level-property extension-definitions. I have an implementation of something a bit like this using STIX2.1 section 11.3 style custom extensions to provide versioning for SCOs (to track a SCO as I add more ID contributing properties from various observable sources) for the cti-python-stix2 library. I intend to port this to section 7.3 style extension definitions (and maintain compatibility with my legacy documents :grimacing: ) so I can report back on how I go with that if it's useful to others as an implementation anecdote.

[fn0] where some_uuid is allocated by the author, probably aiming to use a UUID4 or a UUID5 based on some vendor namespace UUID, the name of the extension definition and the version

jmgnc commented 3 years ago

@jmgnc - Yes I noticed this earlier. It is undesirable, might not be as bad as you think. The assumption would be that the producer of the pattern would use the correct UUID, as would the producer of the content. They wouldn't pick are arbitrary UUID :-). The question is, how to find what the correct extension-definition (and UUID) is. I have made a suggestion to store extension-definitions (and their related jsonschemas, etc) in the common STIX object repo (see https://github.com/oasis-open/cti-stix-common-objects).

I don't have a problem w/ UUID use, the issue is that it makes the pattern EXTREMELY unwieldy to use:

[ network-traffic:'extension-definition--ae313a69-cabe-44f0-b1d2-ba8c6e93fe25'.foobar = 'somevalue' ]

and as an author, I'd have to keep a mapping of what that uuid means, and on top of that, anyone reading that pattern wouldn't know what it means w/o access to it.

I looked at the extension-definition spec, and noticed the definition of the name property (which is required) is:

A name used for display purposes during execution, development, or debugging.

Maybe in the future, this could be the name could be used in the object path in the patterns. I'm not sure if implementations could currently make use of this idea given the current spec.

Yeah, the biggest issue is that name isn't guaranteed to be unique. Another option would be to provide a namespace mapping, so you can say, in this pattern, these names map to these extension definitions, but that doesn't solve 2.1.

As it stands, I'm going to have to recommend using custom properties for this because they will work as them being part of the space. We also don't have a way to specify in an extension that this field is an ipv4 address, which means that ISSUBSET/ISSUPERSET won't work in an independent manner w/o the tools understanding the extension anyways, so simply making extensions cover SCO doesn't solve all the problems.

jmgnc commented 3 years ago

This has the potential to blow out the size of JSON schemas for validation from the top level but since you'd have to define schemas for the extensions themselves as part of the extension-definition, I would imagine using schema references should make authoring top level schemas relatively DRY. I say this without having authored any myself (yet) though so I'm not sure about how easy it will be in reality. Specifically, I would expect that a schema for the extensions key would need to be aware of all of the known extension-definitions and their allocated UUIDs and use a $ref to that definition's own schema. Perhaps validation would also need to be more dynamic though in order to make it easier for downstream STIX consumers to develop their own top-level schemas (or modify/plug into canonical ones) to add their own extension-definition--<uuid> keys with schema $refs to the top level schema?

Yeah, as I mention in my previous reply, the schema doesn't contain all the info necessary to make a SCO extension work completely w/ patterning.

What I see as the real pain in the butt is going to be the using the data model as a human/code author, rather than a schema author. Having to do stuff like create accessors for data stored in a extension-definition key block with some UUID which varies based on the version of the extension-definition-- you have used for some document in your corpus sounds like it'll be kind of disgusting. However, I do see that as something that the STIX2 library should be capable of doing for you. ie. It should be possible to "register" a bunch of extension-definitions with the library which would then be capable of implementing human-friendly accessors for data stored in extension-definition keys for non-top-level-property extension-definitions. I have an implementation of something a bit like this using STIX2.1 section 11.3 style custom extensions to provide versioning for SCOs (to track a SCO as I add more ID contributing properties from various observable sources) for the cti-python-stix2 library. I intend to port this to section 7.3 style extension definitions (and maintain compatibility with my legacy documents 😬 ) so I can report back on how I go with that if it's useful to others as an implementation anecdote.

Yeah, if we need to use tools to author patterns, then we should have just defined the pattern as an AST and required a tool to compose it. Then all of the operator precedence and other issues we had would have gone away.

As I said, I'm not a fan of requiring full UUIDs in a pattern. And I'd prefer to do this as part of the patterning language than having an external tool, something like:

mappings: extname = 'extension-definition--uuid' [ network-traffic:extname.foobar = 'something' ]

[fn0] where some_uuid is allocated by the author, probably aiming to use a UUID4 or a UUID5 based on some vendor namespace UUID, the name of the extension definition and the version

rpiazza commented 3 weeks ago

There are many valid points in the above discussion. Patterns need to be correct for this and to handle SROs (which currently is under defined in the spec). Therefore, all of this work is for 2.2.