johnwunder / twigs

STIX 2.0 Strawman
http://twigs-cti.herokuapp.com/
MIT License
7 stars 3 forks source link

There are too many ways to update an Object (Versioning) #10

Open terrymacdonald opened 8 years ago

terrymacdonald commented 8 years ago

PROBLEM

In the current version of STIX, an object is identified by a combination of its Object ID, and the Timestamp. One can think of the Object ID + Timestamp representing the composite primary key. There are two ways to issue an updated STIX Object in the current version of STIX. (information here: http://stixproject.github.io/documentation/concepts/versioning/)

An Incremental Update only updates the Timestamp. This indicates there is a slight change to the underlying object, as it retains the original Object ID; the later timestamp indicates to the receiving consumer that the new object should overwrite the old one. It is an implicit update.

A Major Update creates a completely new STIX Object, with a modified Object ID + Timestamp. This indicates there is a major change to the underlying object, one large enough to cause a completely new object to be issued. The new STIX Object includes an explicit relationship to the original Object ID + Timestamp, with a type of 'Supersedes' to indicate to the receiving consumer that the new object supersedes the old one. It is an explicit update.

A large problem is that there is no definition (or agreement) as to what update mechanism should be used. The STIX guidance states:

"Current suggested practices suggest using an incremental update whenever you're making very minor changes to a construct that don't change its inherent meaning. Adding an alias to a threat actor, for example, would be an incremental update. Additionally, incremental updates can be used within an organization while it is developing a more final version of the construct in order to avoid churn on IDs. Major updates, on the other hand, are suggested for anything that changes the inherent meaning of a construct or changes of content between organizations. Changing a TTP from "phishing" to "spear phishing", for example, would be a major update because even though phishing and spear phishing are similar the inherent meaning of the construct changed."

Another problem is the recent discussion suggesting using hashes to generate the UUIDs for objects. The use of an Object ID + Timestamp as a composite key does not work with Incremental Updates. If a producer made a mistake and had to reissue the Indicator with a modification, that would result in a completely different Object ID if we were using a hashing function to generate the IDs.

POTENTIAL ANSWER

This could be fixed by enforcing all updates to be new objects (with new Object IDs) with explicitly defined relationships to the old object IDs.

This would mean that:  

terrymacdonald commented 8 years ago

This solution has been added into the 'Why TWIGS is TWIGS' document.

johnwunder commented 8 years ago

So here's my thinking on this:

Regarding making STIX objects immutable

I'm not a huge fan of this approach. Sean and I went back and forth a ton on versioning because he was a proponent of just doing this and I was a proponent of just doing a version identifier. My feelings on this have been that it's a complicated approach intended to support multi-party, asynchronous sharing but because we've limited people to producing content only in their own namespaces we don't actually have multi-party, asynchronous sharing. Since one organization controls the lifecycle of an object we have the luxury of using relatively simple versioning approaches.

Here's why I think it's a big burden:

  1. Since the ID will change every time you update something, the relationships that point to or from that construct will either need to be re-issued or will become ambiguous whether they still apply. Example:
  2. I issue an incident record, with pointers to the threat actors, indicators, and TTPs that it's related to.
  3. I update that incident record because I found a couple more affected assets.
  4. I now need to re-issue those relationships, because the incident that they referred to has been deprecated. If I don't, should a consumer assume they still apply? It's ambiguous.
  5. While that's a burden for the original producer, it becomes more difficult for people creating relationships to objects they don't own. It means that if the producer updates that object, they also need to re-issue relationships pointing to/from that object. For example, I maintain a library of TTPs that people point indicators to. At some point I correct a typo or add some more detail to the TTP. That means that the TTP ID changes and any indicators pointing to the old one become stale. So you, referencing my TTP ID in your indicators, need to decide whether to re-issue those relationships. It's a burden to do it, but if you don't you're pointing at old information and again consumers will have ambiguity over whether you think the relationships are still valid.
  6. It's a complicated approach that requires that repositories track all old versions of constructs. Many may do that, yes. But should we require it of all repositories? One of the key things I didn't like about STIX 1.x was that it required people doing very simple things to support complicated capabilities because a few people might need it. That's why I like the 80% rule.

Suggestions: IMO, versioning in STIX should either be:

  1. Removed from the data model and added to the messages. This is how many other models do things (e.g. REST over HTTP).
  2. Collapsed from the current approach of relationships+timestamps to just an incrementing version identifier on all top-level objects. Then, we allow producers to make incremental updates very easily. If they do make a huge update to something that changes the core semantics, they should just deprecate that object and issue a new one (using a relationship of "deprecates", similar to the major versioning approach now but making it explicit that it's a deprecation rather than an update and you should not expect any existing relationships to still apply unless you validate them).

Object ID based on hash of object

I also think this approach seems complicated to support some fairly limited use cases. It means everyone has to implement (the same) JSON canonicalization approach, support hashing, and actually create hashes for every single piece of content, which can be fairly resource intensive.

While I wouldn't be opposed to some trust groups using this approach if they want to (maybe it could be an extension) it seems to me to be a burden imposed on the majority of usages to support use cases required by only a few.

Suggestions: Make this approach an extension, or support the desired capabilities via other fields (ie digital signature block).

terrymacdonald commented 8 years ago

Hi John,

Regarding #3, I was expecting that all assets would be as separate objects, and that a separate relationship would be created to join them. This wouldn't require updating the object.

Regarding #5 & #6, the previous relationships wouldn't need to be updated as they would stay applicable to all versions of the object unless the object was revoked. I have a section on that in the twigs document.

You are right that it could be easier for the consumers to track the versions if the object ID had a 'revision' field associated with it. If it wasn't populated then the version is the original. If the revision field is 1 or higher then it is an update.

I was adding the immutability feature because I wanted to detect changes made on the wire. We can add that as part of the messaging itself.

Cheers Terry MacDonald On 24/12/2015 2:49 am, "John Wunder" notifications@github.com wrote:

So here's my thinking on this:

Regarding making STIX objects immutable

I'm not a huge fan of this approach. Sean and I went back and forth a ton on versioning because he was a proponent of doing just this and I was a proponent of just doing a version identifier. My feelings on this have been that it's a complicated approach intended to support multi-party, asynchronous sharing but because we've limited people to producing content only in their own namespaces we don't actually have multi-party, asynchronous sharing. Since one organization controls the lifecycle of an object we have the luxury of using relatively simple versioning approaches.

Here's why I think it's a big burden:

1.

Since the ID will change every time you update something, the relationships that point to or from that construct will either need to be re-issued or will become ambiguous whether they still apply. Example: 2.

I issue an incident record, with pointers to the threat actors, indicators, and TTPs that it's related to.

  1. I update that incident record because I found a couple more affected assets. 4.

    I now need to re-issue those relationships, because the incident that they referred to has been deprecated. If I don't, should a consumer assume they still apply? It's ambiguous. 5.

    While that's a burden for the original producer, it becomes more difficult for people creating relationships to objects they don't own. It means that if the producer updates that object, they also need to re-issue relationships pointing to/from that object. For example, I maintain a library of TTPs that people point indicators to. At some point I correct a typo or add some more detail to the TTP. That means that the TTP ID changes and any indicators pointing to the old one become stale. So you, referencing my TTP ID in your indicators, need to decide whether to re-issue those relationships. It's a burden to do it, but if you don't you're pointing at old information and again consumers will have ambiguity over whether you think the relationships are still valid. 6.

    It's a complicated approach that requires that repositories track all old versions of constructs. Many may do that, yes. But should we require it of all repositories? One of the key things I didn't like about STIX 1.x was that it required people doing very simple things to support complicated capabilities because a few people might need it. That's why I like the 80% rule.

Suggestions: IMO, versioning in STIX should either be:

  1. Removed from the data model and added to the messages. This is how many other models do things (e.g. REST over HTTP).
  2. Collapsed from the current approach of relationships+timestamps to just an incrementing version identifier on all top-level objects. Then, we allow producers to make incremental updates very easily. If they do make a huge update to something that changes the core semantics, they should just deprecate that object and issue a new one (using a relationship of "deprecates", similar to the major versioning approach now but making it explicit that it's a deprecation rather than an update and you should not expect any existing relationships to still apply unless you validate them).

\ Object ID based on hash of object **

I also think this approach seems complicated to support some fairly limited use cases. It means everyone has to implement (the same) JSON canonicalization approach, support hashing, and actually create hashes for every single piece of content, which can be fairly resource intensive.

While I wouldn't be opposed to some trust groups using this approach if they want to (maybe it could be an extension) it seems to me to be a burden imposed on the majority of usages to support use cases required by only a few.

— Reply to this email directly or view it on GitHub https://github.com/johnwunder/twigs/issues/10#issuecomment-166899907.

terrymacdonald commented 8 years ago

John had some good points about the implications of using Major Updates only, one of which made the maintenance of relationships of subsequent updates very difficult. John pointed out that utilizing the Incremental Update mechanism instead would provide the following additional benefits:

This update has been put into the TWIGS document.