mitre-attack / attack-stix-data

STIX data representing MITRE ATT&CK
https://attack.mitre.org/
Other
348 stars 87 forks source link

load_from_file slow #8

Closed syntax90 closed 3 years ago

syntax90 commented 3 years ago

from the usage example given, load_from_file takes 3-4 seconds.

is there a way to optimize this?

def main(argv):
    try:
        mitreId = argv[1].upper()
        src = MemoryStore()
        src.load_from_file("enterprise-attack-9.0.json")
        object = src.query([ Filter("external_references.external_id", "=", mitreId) ])[0]

i'm expecting quite a number of python script execution.

isaisabel commented 3 years ago

Hi @syntax90,

It's not possible to optimize the load time of a MemoryStore since the performance is because of the large size of our dataset. However, there are a number of other ways to load the data:

These different options have different performance characteristics. Here's the result of some prior testing I did on the above listed options:

average time in seconds to initialize Enterprise FileSystemSource: 0.00011070399999999481
average time in seconds initialize Enterprise MemoryStore via Requests: 6.5625417941999995
average time in seconds initialize Enterprise TAXIICollectionSource: 0.13343816430000005

average time in seconds to perform example queries on FileSystemSource source: 14.884672774699998
average time in seconds to perform example queries on MemoryStore source: 0.2635593319999998
average time in seconds to perform example queries on TAXII source: 4.0353454838000005

I should note that in the above tests we were doing multiple queries after initializing, the 14 seconds for a FileSystemSource is not reflective of the execution time of a single query.

Overall it takes a long time to initialize a Memorystore, but it's easily the fastest when it comes to the actual query. TAXII is slower for queries but quick to initialize. FileSystemSources are very fast to initialize but slow to query to the point we decided not to include them on this repo (though they're easy to convert from a JSON bundle).

My recommendation is:

If you decide to go the route of using our TAXII server, I should note that we only support TAXII2.0/STIX2.0 through that interface. We do hope to eventually stand up a TAXII2.1/STIX2.1 server but it will likely be a while before that's ready. The current TAXII server serves same content as our current STIX2.1 dataset but without some of the quality-of-life fields and features such as collections and x_mitre_domains.

Hope that helped!

isaisabel commented 3 years ago