oasis-open / cti-python-stix2

OASIS TC Open Repository: Python APIs for STIX 2
https://stix2.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
372 stars 120 forks source link

Determine support for STIX 2.0 and later spec releases #79

Closed emmanvg closed 7 years ago

emmanvg commented 7 years ago

We need to determine how to support STIX 2.0, 2.1 and later versions using the same library. Please use this issue to track ideas and progress surrounding this problem.

emmanvg commented 7 years ago

My first thoughts, include the use of NamedTuples to filter properties by STIX version. The idea behind this approach is to create STIX objects agnostic to versioning (e.g. You can fill all properties). A version is only needed at the moment you serialize or flush the contents.

Object Example:

class Indicator(STIXDomainObject):

    _type = 'indicator'
    _properties = OrderedDict()
    _properties.update([
        STIX20Property('type', TypeProperty(_type)),
        STIX20Property('id', IDProperty(_type)),
        STIX20Property('created_by_ref', ReferenceProperty(type="identity")),
        STIX20Property('created', TimestampProperty(default=lambda: NOW, precision='millisecond')),
        STIX20Property('modified', TimestampProperty(default=lambda: NOW, precision='millisecond')),
        STIX20Property('name', StringProperty()),
        STIX20Property('description', StringProperty()),
        STIX20Property('pattern', PatternProperty(required=True)),
        STIX20Property('valid_from', TimestampProperty(default=lambda: NOW)),
        STIX20Property('valid_until', TimestampProperty()),
        STIX20Property('kill_chain_phases', ListProperty(KillChainPhase)),
        STIX20Property('revoked', BooleanProperty()),
        STIX20Property('labels', ListProperty(StringProperty, required=True)),
        STIX21Property('confidence', IntegerProperty()),
        STIX21Property('lang', StringProperty()),
        STIX20Property('external_references', ListProperty(ExternalReference)),
        STIX20Property('object_marking_refs', ListProperty(ReferenceProperty(type="marking-definition"))),
        STIX20Property('granular_markings', ListProperty(GranularMarking)),
    ])

Serializing:

 >>> i = Indicator(name="My indicator", lang="en", description="STIX 2.1 is approaching!")
 >>> print(i.serialize(v="2.1"))
 ... {"name": "My indicator", "description": "STIX 2.1 is approaching!", "lang": "en", ...}
 >>> print(i.serialize(v="2.0"))
 ... {"name": "My indicator", "description": "STIX 2.1 is approaching!", ....}

Possible problems:

chisholm commented 7 years ago

Yeah squishing all versions into the same class could be too complicated. The differences will probably amount to more than adding a few new properties. Maybe we really do need separate classes. Some ground rules to think about though, to maybe start to shape a design: do you want the library to be overall switchable into different version "modes", so you choose a version and only work with that? Or do you want people to be able to read a v2.0 version of this obj, v2.1 of that obj, and work with them simultaneously? Do we need to track versions, and produce errors if they are mixed improperly? E.g. what if you tried to put a 2.0 object into a 2.1 bundle or vice versa?

gtback commented 7 years ago

Bundle should be a special case, since it's the only place where the version is actually tracked. One benefit of objects being "like" dictionaries is that you can essentially do stix2.v21.Indicator(**stix_20_indicator) to convert, as long as the Indicator is valid under both versions. We can also have special functions that allow the user to specify how to handle situations where it's not valid. As 2.1 matures, we will be better able to know what circumstances this may occur in, and what the most sensible way to behave is.

In my opinion, stix2 should default to the latest supported version (as of the time the package was published) and should be an "alias" to either stix2.v20 or stix2.v21, which holds the corresponding classes. Concepts like DataStores should be shared between STIX versions (not duplicated).

chisholm commented 7 years ago

Yeah, I had the same idea: using that kind of copy construction as a mini-elevator. And I had the same aliasing idea, which would essentially act as a global switch from one version to the other (for code that was written to the alias package, not version-specific packages). Which is why I floated that idea.

Perhaps you are thinking that we could support some kinds of mixing by attempting the mini-elevator conversion? It isn't clear how that should work though. Always update the lower-version object? Would you upgrade a bundle though, if a higher-version object was added, thus triggering upgrades of every other object it contains? Maybe not. What about cascading upgrades along relationships? Maybe it's too automagical and prone to unintended consequences. Maybe it should just error out.

gtback commented 7 years ago

Yeah, the biggest use case is in consuming, where you may not know (in the case of a bundle, it should be specified) what the version is, and wanting to accept mixed versions. For producing, I was thinking we could define as_version('2.1'), where the default value for __str__() is the latest version. But maybe I'm overthinking things (wouldn't be the first time 😉 )

clenk commented 7 years ago

If you add a 2.1 object to a 2.0 bundle I think it should raise an exception - you shouldn't be surprised about the version of the bundle you get back. If you add a 2.0 object to a 2.1 bundle, though, I could see value in auto-upgrading the object, assuming there are sane defaults for the new required properties. The object factory could be useful here if you don't like the defaults this library uses.

parse() will need some logic to make a best-guess at the version of an object based on the presence or values of certain properties. Or each STIX version alias would have its own parse function and just error out if the input isn't compatible with (can't be coerced into) that version.

We'll have to be careful about downgrading objects to a lower STIX version because of possible data loss.

chisholm commented 7 years ago

As far as version "logic", I was thinking just use the latest version possible. E.g. try with the latest, if that fails (e.g. raises an exception), try the previous version, then version before that, etc. If the spec is backward-compatible, the latest version should theoretically always work. And of course make it possible for users to force a particular version.

As far as producing, if all library instances representing objects and observables are version-aware, i.e. aren't "generic" but are associated with a particular STIX version, then every __str__() can only produce one thing: that particular version. To produce a different version, you'd have to do the mini-elevator thing or find some way to create an instance associated with the desired version, and stringify that. Does that make sense?

gtback commented 7 years ago

I like having parse start with the most recent version and work backwards until a version is created with no exceptions, and add a spec_version (or similar) argument to force a version (and error if that version doesn't work. At this point, it appears STIX 2.1 will be strictly backward compatible with STIX 2.0.

Your comments about producing make sense as well. Having a class for each version of each object makes sense; we should be able to consolidate functionality that remains constant between versions in one place.

clenk commented 7 years ago

we should be able to consolidate functionality that remains constant between versions in one place

Most STIX object classes are just a _type and an OrderedDict of properties; subclassing and just adding a property may put it in the wrong order. So I don't think there's much to consolidate. The only exception I can think of is _check_object_constraints() on observables. So maybe we pull those out into standalone functions and have the methods on the observables just call those? (Assuming there are no changes to those constraints between versions, which may be a faulty assumption...)

I guess MarkingDefinition also has an __init__().

On a different topic, how will the package structure change to support different versions? A subpackage (folder) for each version?

gtback commented 7 years ago

So I don't think there's much to consolidate.

True, I think it's better to have explicit full property lists in all versions of all objects, and only consolidate functions where they would otherwise be identical.

how will the package structure change to support different versions?

All of the type-specific classes will be moved into versioned folders/subpackages. So you can from stix2.v20 import Indicator or similar. For STIX 2.0, nothing will change, but when subsequent versions come out we can update stix2.__init__ to load the type from a newer version submodule (this will require a major version bump to python-stix2 since it could introduce breaking changes in client code). ... unless anyone has other ideas...

emmanvg commented 7 years ago

It looks like we reached some consensus on how to move forward with this issue. If we ultimately want to move forward with this approach:

For STIX 2.0, nothing will change, but when subsequent versions come out we can update stix2.init to load the type from a newer version submodule

Do you mean that for 0.3.0 the package structure stays the same, but for (say 1.0.0) then we would apply this structure?

Also,

I think it would be nice to have this sort of functionality. (Having multiple versions available since we use the same class names)

gtback commented 7 years ago

Do you mean that for 0.3.0 the package structure stays the same, but for (say 1.0.0) then we would apply this structure?

I'm on the fence for whether we should pro-actively move (essentially) the contents of sdo.py, sro.py, common.py, and Bundle from core.py into a new v20 directory now, or just wait until 2.1 comes out before addressing this.

Prior to 1.0, we should be explicit that we don't advise importing names from anywhere except the top-level stix2 package; before 1.0 we're still "allowed" to change things under the rules of Semantic Versioning.

How imports will work

My intent is that people can import the modules in different ways depending on their intentions.

People who want to (in general) support the latest version of STIX 2 without making changes, implicitly using the latest version

import stix2
...
stix2.Indicator(...)

or

from stix2 import Indicator
...
Indicator(...)

People who want to use an explicit version:

import stix2.v20
...
stix2.v20.Indicator(...)

or

from stix2.v20 import Indicator
...
Indicator(...)

or even

import stix2.v20 as stix2
...
stix2.Indicator(...)

(The last option makes it easy to update to a new version in one place per file, once you've made the deliberate action to do this)

People who want to use multiple versions in a single file

import stix2
...
stix2.v20.Indicator(...)
...
stix2.v21.Indicator(...)

or

from stix2 import v20, v21
...
v20.Indicator(...)
...
v21.Indicator(...)

or (less preferred):

from stix2.v20 import Indicator as Indicator_v20
from stix2.v21 import Indicator as Indicator_v21
...
Indicator_v20(...)
...
Indicator_v21(...)

All of these approaches should be possible with the currently-planned approach.

clenk commented 7 years ago

Will the documentation only detail the latest spec version? The API Reference section could balloon in size otherwise.

gtback commented 7 years ago

The API reference should contain documentation on all supported versions, but should be 99+% autogenerateod. We can keep the examples/Jupyter notebooks only for the most recent version, though at least one of the guides will probably need to explain how the cross-version support works.

packet-rat commented 7 years ago

Hey howz about some Zeppelin and H2O notebooks?

In any case Greg et al: keep up the great work on the repos and libraries. We really appreciate your efforts.

Patrick Maroney

On Oct 24, 2017, at 5:35 PM, Greg Back notifications@github.com wrote:

The API reference should contain documentation on all supported versions, but should be 99+% autogenerateod. We can keep the examples/Jupyter notebooks only for the most recent version, though at least one of the guides will probably need to explain how the cross-version support works.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

gtback commented 7 years ago

@packet-rat , I'd actually never heard of Zeppelin, but it seems pretty interesting. The big barrier is that we'd like to include them in the ReadTheDocs output (stix2.readthedocs.io). We're currently using nbsphinx to do the integration of Jupyter notebooks into Sphinx.

gtback commented 7 years ago

Closed by #93