os-threat / Stix-ORM

GNU Affero General Public License v3.0
4 stars 0 forks source link

Correct the add() function so it works the same for a list of files, or collecting them all as objects #14

Open brettforbes opened 1 year ago

brettforbes commented 1 year ago

Currently, the add function is reliable under some conditions and not under others. Pull the latest version of the brett-attack branch to see this.

Consider the scenario where we want to load all of the files from the standards directory and there are some files with repeated objects, and some files that depend on objects loaded by previous files. Assume that we load the files in alphabetical order to ensure that dependencies are always pre-loaded, if they are not in the file.

Then there are two conditions outlined by the two methods on line 752 and 752 of test_refactor.py that demonstrate the outcomes:

  1. Load all of the files in alphabetical sequence, where we know that any necessary dependencies are pre-loaded, so each file is sent to the add method independently in sequence run line 753 check_dir_ids(path1) , and this returns

=========================== input len -> 103, typedn len ->82 difference -> {'opinion--b01efc25-77b4-4003-b18b-f6e24b5cd9f7', 'grouping--84e4d88f-44ea-4bcd-bbf3-b2c1c320bcb3', 'file--edb1ebee-4387-41cc-943b-f94fd491118c', 'tool--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f', 'incident--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f', 'vulnerability--0c7b5b88-8ff7-4a4d-aa9d-feb398cd0061', 'file--4b9a516b-4974-4ff8-a50d-a8b8d552ce1f', 'malware--31b940d4-6f7f-459a-80ea-9c1f17b5891b', 'course-of-action--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f', 'process--70b17c6c-93e5-4c80-8683-5a4d4e51f2c1', 'threat-actor--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f', 'domain-name--ecb120bf-2694-4902-a737-62b74539a41b', 'intrusion-set--4e78f46f-a023-4e5f-bc24-71b3ca22ec29', 'location--a6e9345f-5a15-4c29-8bb3-7dcc5d168d64', 'indicator--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f', 'process--d2ec5aab-808d-4492-890a-3c1a1e3cb06e', 'ipv4-addr--efcd5e80-570d-4131-b213-62cb18eaa6a8', 'relationship--44298a74-ba52-4f0c-87a3-1824e67d7fad', 'campaign--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f', 'note--0c7b5b88-8ff7-4a4d-aa9d-feb398cd0061', 'observed-data--b67d30ff-02ac-498a-92f9-32f845f448cf'}

  1. Load all of the objects in the files into a giant list, and then send the giant list to the add function run line 752 check_dir_ids2(path1) , and this returns

=========================== input len -> 103, typedn len ->103 difference -> set()

So there is a very significant difference depending on how you use the add method, when these different approaches should all be managed smoothly by our error handling. In short, we get a 21 object difference between the two methods, yet i can prove on separate tests that only 3 objects have repeated ids. Why are the other 18 objects not loaded? Is this a dependency issue?

Surely we should be checking if the records exist in the database, using a query (see me if you need a specific function to do this.