muchdogesec / stix2arango

stix2arango is a command line tool that takes a group of STIX 2.1 objects in a bundle and inserts them into ArangoDB. It can also handle updates to existing objects in ArangoDB imported in a bundle.
GNU Affero General Public License v3.0
1 stars 0 forks source link

Conflicting `_key` values when inserting update object with same `id` #4

Closed himynamesdave closed 4 weeks ago

himynamesdave commented 4 weeks ago

I've added a quickstart guide that will give you a load of scripts to recreate. Follow instructions here:

https://github.com/muchdogesec/stix2arango/tree/optimizations?tab=readme-ov-file#quickstart

If I run sh insert_archive_attack_enterprise.sh

The end of the error shows

    return response_handler(resp)
  File "/Users/dgreenwood/.pyenv/versions/3.10.9/lib/python3.10/site-packages/arango/aql.py", line 439, in response_handler
    raise AQLQueryExecuteError(resp, request)
arango.exceptions.AQLQueryExecuteError: [HTTP 409][ERR 1210] AQL: unique constraint violated - in index primary of type primary over '_key'; conflicting key: intrusion-set--6a2e693f-24e5-451a-9f88-b36a108e56622018-01-17T12:56:55.080Z (while executing)

It seems the update key logic has been changed in the optimizations branch (which we did discuss).

Here is how the old code got around key conflict for each STIX object type

https://github.com/muchdogesec/stix2arango/tree/main/docs#update-existing-objects-in-vertex-collections

Note the way +timestamp is appended to _key

fqrious commented 4 weeks ago

Okay, this shouldn't be happening as I made the key to always be id+timestamp... Can I have a link to the insert_archive_attack_enterprise.sh file?

himynamesdave commented 4 weeks ago

Check this: https://github.com/muchdogesec/stix2arango/tree/optimizations?tab=readme-ov-file#quickstart

I detail how to get the script there

fqrious commented 4 weeks ago

Alright, I'll fix it tomorrow morning...

fqrious commented 4 weeks ago

Okay, what I'm seeing from the key in the error is that there's another object with the same modified value but different hash (probably different _stix2arango_note) in the DB before this import

I'm thinking of just appending _record_modified_time instead of doc.modified time, this way that kind of collision will never happen again.

Also, I just concatenated without using a "+" to separate the id and the timestamp

himynamesdave commented 4 weeks ago

Great. Just remember to observe the _is_latest behaviour in docs (not all newly imported objects are _is_latest == true, e.g. if newly imported modified time is less than existing modified time