oasis-open / cti-python-stix2

OASIS TC Open Repository: Python APIs for STIX 2
https://stix2.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
364 stars 119 forks source link

Extensions are no longer restricted to use on certain SCOs #519

Closed maybe-sybr closed 3 years ago

maybe-sybr commented 3 years ago

The following snippet breaks on v2.1.0 but works and serialises garbage extension data for a spec-defined type:

import json
import uuid
import stix2.parsing

d = {
    "type": "file",
    "id": f"file--{uuid.uuid4()}",
    "name": "foo",
    "extensions": {
        "unix-account-ext": {"gid": 1001},
    },
}
print(json.dumps(d, indent=2))
o = stix2.parsing.parse(json.dumps(d))
print(o.serialize(pretty=True))

on v2.1.0 ->

Traceback (most recent call last):
  File "dev/oasis/cti-python-stix2/stix2/base.py", line 67, in _check_property
    kwargs[prop_name] = prop.clean(kwargs[prop_name])
  File "dev/oasis/cti-python-stix2/stix2/properties.py", line 677, in clean
    raise CustomContentError("Can't parse unknown extension type: {}".format(key))
stix2.exceptions.CustomContentError: Can't parse unknown extension type: unix-account-ext

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "dev/oasis/cti-python-stix2/foo.py", line 14, in <module>
    o = stix2.parsing.parse(json.dumps(d))
  File "dev/oasis/cti-python-stix2/stix2/parsing.py", line 47, in parse
    obj = dict_to_stix2(obj, allow_custom, version)
  File "dev/oasis/cti-python-stix2/stix2/parsing.py", line 142, in dict_to_stix2
    return obj_class(allow_custom=allow_custom, **stix_dict)
  File "dev/oasis/cti-python-stix2/stix2/base.py", line 278, in __init__
    super(_Observable, self).__init__(**kwargs)
  File "dev/oasis/cti-python-stix2/stix2/base.py", line 161, in __init__
    self._check_property(prop_name, prop_metadata, setting_kwargs)
  File "dev/oasis/cti-python-stix2/stix2/base.py", line 322, in _check_property
    super(_Observable, self)._check_property(prop_name, prop, kwargs)
  File "dev/oasis/cti-python-stix2/stix2/base.py", line 73, in _check_property
    six.raise_from(
  File "<string>", line 3, in raise_from
stix2.exceptions.InvalidValueError: Invalid value for File 'extensions': Can't parse unknown extension type: unix-account-ext

and on master after merge of #468 ->

{
  "type": "file",
  "id": "file--48f88b7c-af82-410a-b1b4-a1a80a7f43ef",
  "name": "foo",
  "extensions": {
    "unix-account-ext": {
      "gid": 1001
    }
  }
}
{
    "type": "file",
    "spec_version": "2.1",
    "id": "file--48f88b7c-af82-410a-b1b4-a1a80a7f43ef",
    "name": "foo",
    "extensions": {
        "unix-account-ext": {
            "gid": 1001
        }
    }
}

I think we probably still need a way to only parse/serialise certain extensions for specific objects. I also have a use case for this with some of my custom objects and extensions where I'd prefer for it to only be possible to use some extension on a specific group of STIX objects. An allow/block list approach would be nice.

I need to run through section 7.3 of the spec again to get a bit more of an understanding wrt what it mandates, suggests and allows in this space. The examples for property-extensions don't make it super clear if there is any mechanism to enforce that only certain objects types may contain an extensions given by some extension-definition. My initial read of the section along with Appendix C.2 suggests that there is no such mechanism which is a bit surprising.

maybe-sybr commented 3 years ago

Passing allow_custom=False to .parse() and doing assert not o.has_custom seems to show that we're not falling afoul of more lenient parsing rules

maybe-sybr commented 3 years ago

In my perfect world, I'd like to see something similar to the valid_types/invalid_types pair used to validate reference properties which is checked against the enclosing_type attribute of ExtensionsProperty (which would need to be restored). This would obviously default to None/None to have the expected behaviour from the examples given in the spec.

Having something in the spec to define the applicable containers for an extension-definition would be really good but I'm sure that'd be a material change so I don't imagine it would happen quickly. If we did manage to justify adding some form of allow/blocklisting for containing types intended for internal use on spec-defined SCOs, I could live with (ab)using it and hoping that all I have to do is improve some extension-definitions in future.

rpiazza commented 3 years ago

@maybe-sybr - the jsonschema that SHOULD be referred to (from the extension-definition object) could help with the enclosing-type issue you bring up. My assumption is that you are good with proceeding with the 3.0.0 release with no additional code changes at this point.

chisholm commented 3 years ago

The spec doesn't seem to disallow this. For the file SCO, it just says:

Dictionary keys MUST use the specification defined name (examples above) 
or be the id of a STIX Extension object, depending on the type of extension
being used.

The "examples" are the extensions given for the file SCO. So it's clearly treating those names as being some of the legal spec defined names, not all of them. unix-account-ext is a specification defined name, so it seems legal, though silly. Spec defined extensions don't have a schema, so it wouldn't help this situation for that type of extension. Seems like the spec would have to change to make this kind of thing illegal for spec defined extensions.

maybe-sybr commented 3 years ago

The spec doesn't seem to disallow this. For the file SCO, it just says:

Dictionary keys MUST use the specification defined name (examples above) 
or be the id of a STIX Extension object, depending on the type of extension
being used.

The "examples" are the extensions given for the file SCO. So it's clearly treating those names as being some of the legal spec defined names, not all of them. unix-account-ext is a specification defined name, so it seems legal, though silly. Spec defined extensions don't have a schema, so it wouldn't help this situation for that type of extension. Seems like the spec would have to change to make this kind of thing illegal for spec defined extensions.

I think I'd read this the other way based on the bit before the spec defined extension names:

The File object defines the following extensions. In addition to these, producers MAY create their own.

ie. Those extensions are well defined only for encapsulating File objects, and I've made the leap that this should be enforced. The use of the "examples above" parenthesis seems poorly worded here, perhaps to avoid creating a situation where the committee felt unable to add more because of overly restrictive language in a previous version of the spec which they need to maintain compatibility with.

@rpiazza - I'll have to look into the mechanism for how to author a jsonschema for an extension - I couldn't find any examples in the spec document :(

My assumption is that you are good with proceeding with the 3.0.0 release with no additional code changes at this point.

Yep, I'd prefer for enclosing type validation to be restored in some form but since I'm currently the only producer/consumer in my system (or I mandraulically import other data sources), I don't forsee any evil which could really be done if the current state of master were released at 3.0.0. I would like to land the changes I've proposed in #513 (ref #511) and one of my virtual subclass or mutable ref types suggestions related to that so that I don't need to continue to run a staging branch of my own.

chisholm commented 3 years ago

Yeah the wording in the "bit before" also seems weird to me. I don't think it should talk as if the file SCO can "define" anything. The spec defines things. I read it as "this spec defines the following extensions for use with the file object", and that the list is not exclusive. At least, it doesn't say that others are not allowed, and it certainly seems to imply that they are with the "in addition" clause, and by referring to the list as "examples". On the one hand, allowing other spec-defined extensions seems silly, but on the other, the wording really doesn't lock it down enough.

maybe-sybr commented 3 years ago

Sounds like something for the TC to think about and clarify. If you're both satisfied with the current behaviour then I'm cool to take this on notice to avoid derailing your plans for 3.0.0 (beyond having #513 up anyway). I'll look at raising the points above for the TC when I get some free time next week.

rpiazza commented 3 years ago

@maybe-sybr,

Yes an issue in https://github.com/oasis-tcs/cti-stix2 would he helpful!

maybe-sybr commented 3 years ago

I've just made oasis-tcs/cti-stix2#276 for this. FYI @chisholm, I quoted you there but also elaborated a bit based on my interpretation of your comments above. Please feel free to clarify your interpretation of the spec if I've translated it poorly there :)

I think we can either close this as raised with the TC and perhaps there'll be future action based on the clarification we get, or leave it open as a reminder that there's an outstanding question there which related to the Python lib implementation. Entirely up to you