oasis-open / cti-python-stix2

OASIS TC Open Repository: Python APIs for STIX 2
https://stix2.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
364 stars 119 forks source link

Parsing or copying STIX objects with optional properties with default values creates new data #508

Closed maybe-sybr closed 3 years ago

maybe-sybr commented 3 years ago

While doing some experimentation related to my situation described in #507, I found that parsing or making a copy of a STIX object (specifically one of my "event" observables), which has no value for an optional property with some non-constant default value like NOW, actually sets that property by calling the default callable.

An example:
```python import copy import json import time import uuid import stix2 import stix2.parsing import stix2.utils from stix2.properties import TimestampProperty @stix2.CustomObservable( "mystix-example-type", properties=( ("foo", TimestampProperty( required=False, default=lambda: stix2.utils.NOW, )), ), ) class MySTIXExampleType(): pass print("INPUT DATA") data = { "type": "mystix-example-type", "id": f"mystix-example-type--{uuid.uuid4()}", } print(json.dumps(data, indent=2)) print("\n\n-- Parsing JSON creates new data in the STIX object") stix_obj = stix2.parsing.parse(data) print(f"{stix_obj=!r}") time.sleep(1) print("-- Which is diligently serialized, but transforms data!") print(f"{stix_obj.serialize(pretty=True)}") print("\n\n-- Deleting the property value hackishly") del stix_obj._inner["foo"] print(f"{stix_obj=!r}") print("\n\n-- Serialization creates new data as one might expect") print("Although calling NOW after I deleted it is perhaps a bit odd") time.sleep(1) print(f"{stix_obj.serialize(pretty=True)}") print("\n\n-- Making a deep copy creates new data") stix_obj_copy = copy.deepcopy(stix_obj) print(f"{stix_obj_copy=!r}") time.sleep(1) print("-- Which is retained in later serializations") print(f"{stix_obj_copy.serialize(pretty=True)}") ``` -> ``` $ python mystix.py INPUT DATA { "type": "mystix-example-type", "id": "mystix-example-type--7ecf9b48-ee9a-4729-b54e-62e87df16d77" } -- Parsing JSON creates new data in the STIX object stix_obj=MySTIXExampleType(type='mystix-example-type', spec_version='2.1', id='mystix-example-type--7ecf9b48-ee9a-4729-b54e-62e87df16d77', foo='2021-06-17T03:46:43.924498Z') -- Which is diligently serialized, but transforms data! { "type": "mystix-example-type", "spec_version": "2.1", "id": "mystix-example-type--7ecf9b48-ee9a-4729-b54e-62e87df16d77", "foo": "2021-06-17T03:46:43.924498Z" } -- Deleting the property value hackishly stix_obj=MySTIXExampleType(type='mystix-example-type', spec_version='2.1', id='mystix-example-type--7ecf9b48-ee9a-4729-b54e-62e87df16d77') -- Serialization creates new data as one might expect Although calling NOW after I deleted it is perhaps a bit odd { "foo": "2021-06-17T03:46:45.92698Z", "type": "mystix-example-type", "spec_version": "2.1", "id": "mystix-example-type--7ecf9b48-ee9a-4729-b54e-62e87df16d77" } -- Making a deep copy creates new data stix_obj_copy=MySTIXExampleType(type='mystix-example-type', spec_version='2.1', id='mystix-example-type--7ecf9b48-ee9a-4729-b54e-62e87df16d77', foo='2021-06-17T03:46:45.928454Z') -- Which is retained in later serializations { "type": "mystix-example-type", "spec_version": "2.1", "id": "mystix-example-type--7ecf9b48-ee9a-4729-b54e-62e87df16d77", "foo": "2021-06-17T03:46:45.928454Z" } ```

I think this might be a more convincing use case for the UNSET constant I proposed in #507. Using this similar diff seems to make parsing and deepcopying non-transformative, although it would break the existing assumptions that all properties with default values are available for dotted attribute access (e.g. stix_obj.foo raises an AttributeError now).

diff --git a/stix2/base.py b/stix2/base.py
index c0e52e6..73e9b67 100644
--- a/stix2/base.py
+++ b/stix2/base.py
@@ -37,6 +37,8 @@ __all__ = ['STIXJSONEncoder', 'STIXJSONIncludeOptionalDefaultsEncoder', '_STIXBa
 DEFAULT_ERROR = "{type} must have {property}='{expected}'."
 SCO_DET_ID_NAMESPACE = uuid.UUID("00abedb4-aa42-466c-9c01-fed23315a9b7")

+UNSET = object()
+

 def get_required_properties(properties):
     return (k for k, v in properties.items() if v.required)
@@ -64,10 +66,11 @@ class _STIXBase(Mapping):
                 kwargs[prop_name] = value

         has_custom = False
-        if prop_name in kwargs:
+        unclean_prop_val = kwargs.pop(prop_name, UNSET)
+        if unclean_prop_val is not UNSET:
             try:
                 kwargs[prop_name], has_custom = prop.clean(
-                    kwargs[prop_name], allow_custom,
+                    unclean_prop_val, allow_custom,
                 )
             except InvalidValueError:
                 # No point in wrapping InvalidValueError in another
@@ -243,6 +246,8 @@ class _STIXBase(Mapping):
     def __deepcopy__(self, memo):
         # Assume: we can ignore the memo argument, because no object will ever contain the same sub-object multiple times.
         new_inner = copy.deepcopy(self._inner, memo)
+        for prop_name in self._properties:
+            new_inner.setdefault(prop_name, UNSET)
         cls = type(self)
         if isinstance(self, _Observable):
             # Assume: valid references in the original object are still valid in the new version
diff --git a/stix2/parsing.py b/stix2/parsing.py
index e13ff67..d97ad79 100644
--- a/stix2/parsing.py
+++ b/stix2/parsing.py
@@ -2,6 +2,7 @@

 import copy

+from . import base
 from . import registry
 from .exceptions import ParseError
 from .utils import _get_dict, detect_spec_version
@@ -88,6 +89,9 @@ def dict_to_stix2(stix_dict, allow_custom=False, version=None):
             return stix_dict
         raise ParseError("Can't parse unknown object type '%s'! For custom types, use the CustomObject decorator." % obj_type)

+    for prop_name in obj_class._properties:
+        stix_dict.setdefault(prop_name, base.UNSET)
+
     return obj_class(allow_custom=allow_custom, **stix_dict)
chisholm commented 3 years ago

In accordance with my discussion from issue 507, a defaulted optional property is effectively logically required. Using the NOW default is a convenience which essentially allows the system to compute a value for you. As mentioned in the other issue, the assumption is that omitting the property would result in a noncompliant object. If you don't set the property, the property must be included with its default value. It's not so much a transformation, as ensuring compliance by taking advantage of the provided default.

maybe-sybr commented 3 years ago

Closing per the discussion on #507. Thanks again @chisholm and @clenk !