mitre-attack / mitreattack-python

A python module for working with ATT&CK
https://mitreattack-python.readthedocs.io/
Apache License 2.0
470 stars 108 forks source link

[Bug] attacktoExcel creating ics-attack.xlsx with different column order #136

Closed davidhwilliams closed 1 year ago

davidhwilliams commented 1 year ago

Expected Behavior

Relationships tab should be ordered like the following to match 'enterprise' and 'mobile' sheets source ID source name source type mapping type target ID target name target type mapping description

Actual Behavior

ICS relationships tab is ordered with 'source ID' at the end source name source type mapping type target ID target name target type mapping description source ID

Steps to Reproduce the Problem

  1. Run
    for domain in ("enterprise-attack", "ics-attack", "mobile-attack"):
    attackToExcel.export(domain=domain, version='v'+most_recent_version, output_dir=output_dir)

    Tested with and without a version and get the same result

Possible Solution

I can't narrow down what could be causing this. Is anyone else able to reproduce this?

jondricek commented 1 year ago

Ok, so this was pretty tricky to track down. To be honest I don't have a solution, but I think I understand the problem a bit better now. Here goes.

The issue

mitreattack/attackToExcel/attackToExcel.export() calls build_dataframes() which calls stixToDf.relationshipsToDf().

The Pandas dataframes that are returned are created here from the relationship_rows array of dictionaries.

From the Pandas dataframe documentation it states:

If data is a list of dicts, column order follows insertion-order.

So this means that whatever the first item in our list of Relationships is will set the column order. So what is our first item?

We read the STIX bundle and get all the Relationships here. I'm not sure if the stix2 library's query is deterministic when we filter for the relationships or not. But shortly after that we start looping through them here.

An important note at this moment is that the Source ID is an ATT&CK ID of an object, and while most ATT&CK objects have ATT&CK IDs, Data components currently do not.

So in our list of Relationships from the ICS STIX bundle, let's assume that the first object is a "Data component detects Technique" type of Relationship. So when the Source ID WOULD be added here it determines, "oh! there's no ATT&CK ID so I'll just skip that" right here which all things considered is fine. But the thing is - now it is our first row dictionary in an array that will be turned into a Pandas dataframe that sets the order for the Excel file.

One last thing - we sort the rows here before sending them back up the chain to be written to Excel files, but sorting them at this point is too late to change column order which has already been set.

What to do next

Possibly the answer here is nothing. In our development version of ATT&CK for ICS the ordering is different and the column order is as expected. This will be released later this month on October 31, so we might not see the issue again after that. But we still might - it seems unclear.

So a real solution would be to sort the relationship_rows array just before this line which sets the column order based on that first row dictionary's fields. That's probably the better solution here.

An even longer term solution is to give Datacomponents ATT&CK IDs like everything else so we wouldn't be in this problem in the first place. But that is a larger structural change that would need to be coordinated across multiple git repositories.

Final thoughts

Anyway, hope that helps clear up why there is a weird issue with the column order! If someone wants to tackle this and try out a potential solution, feel free to let us know here and we'll see about moving the ball forward.