oasis-open / cti-python-stix2

OASIS TC Open Repository: Python APIs for STIX 2
https://stix2.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
356 stars 113 forks source link

Revamp how STIX content customization is detected and enforced #503

Closed chisholm closed 3 years ago

chisholm commented 3 years ago

This PR shows work done to improve the way customization is handled. One of the main changes is that STIX objects no longer maintain an internal _allow_custom flag. In order to enforce that there is no customization (when allow_custom=False), one needs to know if customization is present. Just because customizations were deemed okay at construction time (allow_custom=True) does not mean they actually exist. So instead, objects are given a has_custom property, which directly addresses the question and makes customization detection in sub-objects possible.

The responsibility for both enforcing customization preferences and detecting the presence of customizations lies with the property clean() methods. Before, those methods could not easily do their enforcement job because they were not told what the allow_custom preference was. There was some very ugly hackage in places where that became a problem, which is now removed. In this PR, the clean() methods are now passed the allow_custom preference. They also now return not only the cleaned value, but also a boolean flag indicating whether customization was detected (which of course needs to be False if customization was not allowed!).

Enforcement has been tightened up in certain places. This will catch customization in places where it didn't before, and in fact revealed some unit test bugs which have been fixed. For example, open-vocabs are now properly enforced. There is a new OpenVocabProperty class designed specifically for this, and STIX object classes have been changed to use it. All of the enum and vocab values from the STIX spec have been added to special vocab modules for both 2.0 and 2.1; they are the enforcement criteria now. The difference between an enum and open vocab is that no customization is ever allowed with enums, as that would result in non-compliant content; open vocabs admit values not listed in the spec, which are considered a customization and will be enforced accordingly. Another example is that a customization in an extension will in all cases be caught and cause an error if allow_custom=False in the parent object. The state of "customization" ought to always propagate upward. An object has been customized if custom content occurs anywhere in the object.

Code dealing with hashes has been moved to a new module to remove clutter from the properties module. Hash types are now properly enforced for each spec version, which are different in 2.0 and 2.1. It supports some flexibility in how hash algorithms are named in user content, while ensuring that serialized output uses the spec-defined spelling/casing (which also differs between STIX 2.0 and 2.1).

Public API has mostly not changed, but user code which uses non-public APIs might break (e.g. if it called a clean() method). Some property constructor args changed (e.g. allow_custom was removed since it doesn't make sense there). Customization enforcement behavior has changed, so code/content which worked before might now trigger errors about custom content.

This PR includes changes which would address issue #501 . The way in which ReferenceProperty does customization enforcement is a bit more sophisticated. When a whitelist constraint on generic STIX type categories is given for this property and allow_custom=True during cleaning, the whitelist is automatically converted to a blacklist on the complementary categories. This effectively weakens the criteria with respect to unregistered custom types: it ensures that types known to be incorrect are not given, as opposed to ensuring that types known to be correct are given. An unregistered custom type is not known to be incorrect (the library knows nothing about it if it has not been registered), so it will satisfy the weakened criteria and be allowed.

Fixes #333. Fixes #433. Fixes #498. Fixes #501.

clenk commented 3 years ago

As we discussed elsewhere, let's change this PR to allow custom open vocab values again for now. In a future version we'll work out more granular customization settings to support checking them.

emmanvg commented 3 years ago

The other general comment was to inspect the documentation/ipynb guides to check for any content that needs updating based on these changes.

adulau commented 3 years ago

Customization enforcement behavior has changed, so code/content which worked before might now trigger errors about custom content. What's the background for such change in the reference implementation? Can you point me to the standard where this is a required change?

clenk commented 3 years ago

@adulau the reason for these changes is not anything in the standard. There have been inconsistencies with this library's handling of custom content that has led to several issues, such as #501, #498, #433, #333. That's what we're trying to address here. But as part of that, some things have changed, such as the parameters and return values of the clean() method. Some content that gave custom content errors before in a nested object or reference properties are now allowed.

chisholm commented 3 years ago

Having an allow_custom config setting was a software design decision, not a requirement from the specification. It was thought to be desirable to allow users to cause the library to produce errors when custom content is encountered, when they want to prevent it. It can also be useful for catching typos and mistakes. That design decision is much older than this PR, and has been in the library since the beginning as far as I know. This PR is about improving how that system works.

I mentioned in the PR summary that it could cause new errors, which sounds alarming; maybe I should also have been clearer that it can resolve errors too :) The last paragraph describes one such case.

codecov-commenter commented 3 years ago

Codecov Report

Merging #503 (5cce864) into master (2743b90) will decrease coverage by 2.34%. The diff coverage is 58.04%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #503      +/-   ##
==========================================
- Coverage   89.47%   87.12%   -2.35%     
==========================================
  Files         147      151       +4     
  Lines       16599    17702    +1103     
==========================================
+ Hits        14852    15423     +571     
- Misses       1747     2279     +532     
Impacted Files Coverage Δ
stix2/test/test_workbench.py 100.00% <ø> (ø)
stix2/test/v20/conftest.py 100.00% <ø> (ø)
stix2/test/v20/test_datastore_filters.py 100.00% <ø> (ø)
stix2/test/v20/test_datastore_memory.py 100.00% <ø> (ø)
stix2/test/v20/test_pickle.py 100.00% <ø> (ø)
stix2/test/v20/test_versioning.py 100.00% <ø> (ø)
stix2/test/v21/conftest.py 100.00% <ø> (ø)
stix2/test/v21/test_datastore_filters.py 100.00% <ø> (ø)
stix2/test/v21/test_datastore_memory.py 100.00% <ø> (ø)
stix2/test/v21/test_indicator.py 100.00% <ø> (ø)
... and 40 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 2743b90...5cce864. Read the comment docs.