mitre / stixmarx

Data Markings API for STIX 1.x
Other
8 stars 7 forks source link

Properly mark duplicate nodes #9

Closed stevefranchak closed 6 years ago

stevefranchak commented 6 years ago

Issue

We are using stixmarx in order to determine the TLP for entities within STIX packages. I noticed that if a node like <FileObj:Size_In_Bytes>3282</FileObj:Size_In_Bytes> is present more than once within a given XML file and said XML file has a STIXHeader that globally applies a TLP marking to all nodes and attributes within the document, only one of the repeated nodes would be given a marking by stixmarx. Per the previous example, only one UnsignedLong object associated with the value 3282 would have a __datamarkings__ attribute. Using get_markings with the passed-in data being a subsequent object with the same value results in an empty list.

This issue can be reproduced with the following XML:

<stix:STIX_Package
    xmlns:FileObj="http://cybox.mitre.org/objects#FileObject-2"
    xmlns:MutexObj="http://cybox.mitre.org/objects#MutexObject-2"
    xmlns:cybox="http://cybox.mitre.org/cybox-2"
    xmlns:cyboxCommon="http://cybox.mitre.org/common-2"
    xmlns:cyboxVocabs="http://cybox.mitre.org/default_vocabularies-2"
    xmlns:exampledata="http://stix.exampledata.org"
    xmlns:indicator="http://stix.mitre.org/Indicator-2"
    xmlns:report="http://stix.mitre.org/Report-1"
    xmlns:stix="http://stix.mitre.org/stix-1"
    xmlns:stixCommon="http://stix.mitre.org/common-1"
    xmlns:stixVocabs="http://stix.mitre.org/default_vocabularies-1"
    xmlns:ttp="http://stix.mitre.org/TTP-1"
    xmlns:marking="http://data-marking.mitre.org/Marking-1"
    xmlns:tlpMarking="http://data-marking.mitre.org/extensions/MarkingStructure#TLP-1"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" id="exampledata:Package-51a10880-a9f1-4303-9b0c-5ba9b63af0cc" version="1.2">
    <stix:STIX_Header>
        <stix:Handling>
            <marking:Marking>
                <marking:Controlled_Structure>../../../../descendant-or-self::node() | ../../../../descendant-or-self::node()/@*</marking:Controlled_Structure>
                <marking:Marking_Structure xsi:type='tlpMarking:TLPMarkingStructureType' color="GREEN"/>
                <marking:Marking_Structure xsi:type='tlpMarking:TLPMarkingStructureType' color="WHITE"/>
            </marking:Marking>
        </stix:Handling>
    </stix:STIX_Header>
    <stix:Observables cybox_major_version="2" cybox_minor_version="1" cybox_update_version="0">
        <cybox:Observable id="example:observable-c0296696-31d1-44f4-b6c8-039d9437f8fc">
            <cybox:Object id="example:file-b2c985a8-646e-4d37-91e0-0c3d5b7cef5b">
                <cybox:Properties xsi:type="FileObj:FileObjectType">
                    <FileObj:Size_In_Bytes>3282</FileObj:Size_In_Bytes>
                    <FileObj:File_Format>data</FileObj:File_Format>
                    <FileObj:Hashes>
                        <cyboxCommon:Hash>
                            <cyboxCommon:Type xsi:type="cyboxVocabs:HashNameVocab-1.0">SHA384</cyboxCommon:Type>
                            <cyboxCommon:Simple_Hash_Value>2cf166b15711818a42b0e1a45895020cd8884a360c66a7e07814394270277d08ca68da687d47a071318b641e23d57b7b</cyboxCommon:Simple_Hash_Value>
                        </cyboxCommon:Hash>
                    </FileObj:Hashes>
                </cybox:Properties>
            </cybox:Object>
        </cybox:Observable>
        <cybox:Observable id="example:observable-d0296696-31d1-44f4-b6c8-039d9437f8fc">
            <cybox:Object id="example:file-c2c985a8-646e-4d37-91e0-0c3d5b7cef5b">
                <cybox:Properties xsi:type="FileObj:FileObjectType">
                    <FileObj:Size_In_Bytes>3282</FileObj:Size_In_Bytes>
                    <FileObj:File_Format>data</FileObj:File_Format>
                    <FileObj:Hashes>
                        <cyboxCommon:Hash>
                            <cyboxCommon:Type xsi:type="cyboxVocabs:HashNameVocab-1.0">SHA384</cyboxCommon:Type>
                            <cyboxCommon:Simple_Hash_Value>3cf166b15711818a42b0e1a45895020cd8884a360c66a7e07814394270277d08ca68da687d47a071318b641e23d57b7b</cyboxCommon:Simple_Hash_Value>
                        </cyboxCommon:Hash>
                    </FileObj:Hashes>
                </cybox:Properties>
            </cybox:Object>
        </cybox:Observable>
    </stix:Observables>
</stix:STIX_Package>

Solution

Parsed entities in stixmarx/parser.py are being collected in a set. As a result, any entities that are equal to an entity that exists in the set will be discarded. I'm not sure if this deduplication of entities is the intended behavior, but it appears to produce unintended results. I propose changing the set to a list.

clenk commented 6 years ago

Thank you for the pull request, @stevefranchak! We'll take a look at this when we get a chance.

stevefranchak commented 6 years ago

Great, thank you for pushing out this fix! We're already using 1.0.4. 👍