Open rpiazza opened 2 years ago
Relationships have a strong use case for UUIDv5’s when recording DNS resolutions. While you can store resolution information for domain-names in the resolves_to_refs
property this doesn’t permit time bounding this information. As such any time data for it would need to come from an observed data containing both.
Unfortunately this does not effectively convey the information since the first_observed
and last_observed
fields of observed-data only tell you that this resolution occurred X times within this time range not that it held true for the entire duration of this range.
Instead using an external relationship makes this far easier. A relationship has an explicit start_time
and end_time
that makes it very clear that this is the exact time period where this DNS resolution held true.
Using a UUIDv5 based relationship with created set equal to the start_time
makes it easy for a very fast distributed mapping of this resolution that can stream updates without an issue or a requirement to re-architect systems to store STIX IDs internally. While the resolution is still valid stop_time
is omitted then the current description of stop_time
describes our understanding of this perfectly:
“If stop_time
is not specified, then the latest time at which the relationship between the objects exists is either not known, not disclosed, or has no defined stop time.”
This technique allows any vendor that can map things like domain resolution or certificate hosting history over time already to quickly provide a STIX output that is easy to ingest for existing systems. Our current proposal suggests the following properties to be used to generate a UUIDv5 for relationships: relationship_type
, labels
, source_ref
, target_ref
, created_by_ref
, created
, and start_time
.
If it lapses and DNS had a hole in it then a new ID would be generated for the new start time with no relationships showing resolution in that time window
I find it odd that SCOs use UUIDv5 but the Observed Data SDO doesn't. Observed Data SDO is effectively a container for SCOs, logically equivalent to a log event. Having a unique ID here helps a ton in data deduplication.
A use case is stix-shifter: if you run the same STIX pattern (bounded with START
and STOP
) against a data source twice, you expect to see the same observations.
@pcoccoli - I'm not sure I understand your question. Yes - the observed data object is essentially a log event. Wouldn't each log event have different first observed/last observed time. If you are seeing the same event over and over, you could create a new version of that observed data object, with updated last observed time and keep track of the number of times you saw the event in the number_observed property.
The proposal suggested in this issue is to explicitly allow the use of UUIDv5 for certain SDO/SROs.
IMHO, this sentence alone raises problems regarding the STIX principles for versioning, but if your point was strictly on the thing you said, I would agree:
"A simpler solution would be to use a UUIDv5, based on the CVE id. All producers could determine what the appropriate vulnerability id is without having to store the object or obtain it from the common object repository, and just use the id for references to the CVE."
However, this also means that people can only use OASIS STIX Namespace to determine the STIX ID of the Vulnerability object but NEVER to generate new vulnerability objects with that ID.
I believe this raises a need to something that I've been spoting which is a library on top of stix2 that deals with this use cases.
@rpiazza what about chat with cve.org guys to also provide stix version in their repo?
Last year they launched the JSON 5 Format, this year with MITRE help they could launch a version with STIX format.
This approach ensures that the source of vulnerabilities management is also the producer of the stix objects and keep them updated following STIX principles.
I find it odd that SCOs use UUIDv5 but the Observed Data SDO doesn't. Observed Data SDO is effectively a container for SCOs, logically equivalent to a log event. Having a unique ID here helps a ton in data deduplication. A use case is stix-shifter: if you run the same STIX pattern (bounded with
START
andSTOP
) against a data source twice, you expect to see the same observations.
@pcoccoli Yes, totally understand your point. The only reason an Observed Data, as it is today, cannot be an SCO with a UUIDv5 is because of its field number_observed
. An SCO does not have versions, therefore, is not expected to be changed accross the time. That number_observed
field makes a case for keep updating the Observed Data object in order to avoid a lot of Observed Data objects everytime you see the same "observation".
One possible approach to convert Observed Data to an object with no versions (SCO UUIDv5) would be something like:
first_observed
last_observed
number_observed
timestamp
.Again, this will force to generate a lot of observed-data objects.
So, I believe the TC went to the current approach in order to avoid a lot of objects, even though does not allow deduplication like you would understandably expect.
@SYNchroACK Observed Data is a deprecated object. It is an artifact from when we first started building STIX 2. It represents a Graph inside of a Graph. The reason we went that way is we did not want every IP address to have a unique ID. It was not until we better understood how to use UUIDv5 that we looked at making that change.
To address your other comment, SCOs are "facts" or empirical data that does not change and is not open to debate or confidence or other bits of data. You connect SCOs to intelligence and that intelligence can change and what not, or be added to. This is why there are UUIDv4 addresses for SDOs and UUIDv5 for SCOs.
@SYNchroACK Observed Data is a deprecated object. It is an artifact from when we first started building STIX 2. It represents a Graph inside of a Graph. The reason we went that way is we did not want every IP address to have a unique ID. It was not until we better understood how to use UUIDv5 that we looked at making that change.
To address your other comment, SCOs are "facts" or empirical data that does not change and is not open to debate or confidence or other bits of data. You connect SCOs to intelligence and that intelligence can change and what not, or be added to. This is why there are UUIDv4 addresses for SDOs and UUIDv5 for SCOs.
@jordan2175 You mean this Observed Data is deprecated? https://docs.oasis-open.org/cti/stix/v2.1/os/stix-v2.1-os.html#_p49j1fwoxldc
I think there might be some confusion here. The usage of the objects
property within Observed Data is deprecated in favor of object_refs
. Observed Data itself is still very much supported and required for a number of use cases including for Sightings.
In this context the usage of deterministic IDs for both Observed Data and Sightings (as a type of relationship) would likely be extremely useful to prevent data duplication.
I think there might be some confusion here. The usage of the
objects
property within Observed Data is deprecated in favor ofobject_refs
. Observed Data itself is still very much supported and required for a number of use cases including for Sightings.
Yup, exactly!
In this context the usage of deterministic IDs for both Observed Data and Sightings (as a type of relationship) would likely be extremely useful to prevent data duplication.
Well, in fact, even Relationship object should have a deterministic ID, however, with the current core structure of the objects, that cannot be achieved. In order to met that goal (which I totally agree), there is a need for a core restructure splitting objects in the following types:
An object with or without deterministic IDs which represents a set of properties like the following, that must always have an embedded reference to an Atom object:
atime
, ctime
, mtime
, operating_system
)md5
, sha1
, sha256
)start_time
, stop_time
, created_by_ref
)first_seen
, last_seen
, count
, created_by_ref
, created
, modified
)first_observed
, last_observed
, count
, created_by_ref
, created
, modified
)description
, description_type
, created_by_ref
, created
, modified
)marking_ref
, object_ref
, selectors
, created_by_ref
, created
, modified
)description
, resource_level
, goals
, ..., created_by_ref
, created
, modified
)A particle ID may be UUIDv4 or UUIDv5 depending on the scenario:
count
property may need to be revisited.In practice, a particle can have a deterministic ID if the producer will never have to update it, otherwise, the versioning mechanism needs to be in place (like in stix 2.1) which then makese the case to use UUIDv4.
An object with deterministic IDs which represents base STIX element like:
value
property),path
property) with OS Timestamps
particlename
) with Intrusion Set Context
particleOn objects that represent threats like Threat Actor
, Intrusion-set
, Malware
:
first_seen
and last_seen
will be replaced by the use of Sighting objectaliases
will be replaced by Relationship object with a new relationship_type alias-of
in order to track keep track of who did that link, when and what is the confidence level of that assertion.An object with deterministic IDs which represents a set of Atom objects like:
sighting_of_ref
, observed_data_refs
, where_sighted_refs
) and the rest of properties should be particles (Sight Timestamps
, Descriptive Context
), pointing to the Sighting.object_refs
) and the rest of properties should be particles (Observed Timestamps
), pointing to the Observed Data.source_ref
, target_ref
, relationship_type
) and the rest of properties should be particles (Assertion Timestamps
, Descriptive Context
), pointing to the Relationship.An object without deterministic IDs which represents a special set of Atom objects like:
Report ...
Incident ...
I have a draft of a proposal for a possible stix 3.0, in case you find it interesting, ping me. ;)
The specification was written to encourage use of UUIDv5 for SCOs to avoid duplication of objects that represent the same thing - e.g., an IP address. There is an algorithm in the spec that should be used to generate the UUIDv5 ids, based on specified properties for each SCO and an explicitly defined namespace. Other algorithms may be used, as described in this text from section 3.4:
Using UUIDv5 ids for SDO/SROs is not explicitly discussed in the spec, but is not explicitly prohibited either. The following text from section 2.9 can imply that UUIDv5 ids can be used for them:
There is at least one use case for using UUIDv5 ids for SDOs - representing CVEs using the Vulnerability SDO.
It was recognized that having many duplicate Vulnerability objects to represent a particular CVE is not ideal. For this reason, the common STIX object repository includes a "canonical" Vulnerability object for each CVE, and the repository is updated nightly to include the CVEs created that day.
However, because of the large number of CVEs (over 100000) this seems not to be an ideal solution. A simpler solution would be to use a UUIDv5, based on the CVE id. All producers could determine what the appropriate vulnerability id is without having to store the object or obtain it from the common object repository, and just use the id for references to the CVE.
Based on the text from section 2.9, this is already possible to do, but the explicit namespace CAN NOT BE USED. This implies that producers would pick a namespace, which would most likely differ from other producers, defeating the whole purpose of the use of UUIDv5s. Of course, this namespace could be published so it is known to the community - but that seems problematic.
The proposal suggested in this issue is to explicitly allow the use of UUIDv5 for certain SDO/SROs.