mitre / cti

Cyber Threat Intelligence Repository expressed in STIX 2.0
Other
1.71k stars 410 forks source link

Is enterprise-attack.json valid STIX 2.0? #111

Closed thebleucheese closed 3 years ago

thebleucheese commented 3 years ago

I'm running into an issue using the OASIS STIX 2.0 json schemas available here https://github.com/oasis-open/cti-stix2-json-schemas/tree/stix2.0 and validating MITRE enterprise att&ck data.

The latest enterprise-attack.json file is throwing validation errors. A good example of this is the revoked attack-pattern attack-pattern--519630c5-f03f-4882-825c-3af924935817. I saw a note in a prior issue that revoked data may have some fields removed.

I realize this is revoked and that's why data has been removed, but the STIX 2.0 json validators for attack-pattern objects require properties 'kill_chain_phases' and 'description' which are missing from this data. This presents a kind of chicken or the egg issue where we have to evaluate heuristics against JSON that we're processing to guess if it's enterprise-attack vs normal STIX 2.0 vs some other JSON format. If it looks like enterprise-attack then we have to pre-process the json and remove any entries that are revoked prior to validation. That workflow is quite different from how we handle JSON validation anywhere else.

Is this intentional, are we doing something wrong with validation, or do you have any suggestions on how to handle this?

Thanks for your continued work on adding more threat intelligence to the CTI. It's a great project.

isaisabel commented 3 years ago

Hi @thebleucheese,

To answer your general question, yes all the data in this project should be valid STIX 2.0 JSON. If you find anything which doesn't match the spec, please let us know so that we can rectify the issue.

Anyway, according to the STIX 2.0 spec, attack-pattern objects don't require the kill_chain_phases and description fields. Only name and type are required properties:

image

The removal of those additional fields is intended behavior on ATT&CK's side. I don't believe the STIX documentation for revocations requires that fields are removed: it states "This specification does not address how implementations should handle revoked data." So this is a choice on the part of the ATT&CK team.

This may be an issue with your validator, though in my brief survey of their source code it seems that even there only name is marked as required for attack-patterns. So I'm not really sure what's going on here. You might want to post an issue on their issue tracker about why the schema fails since the given attack-pattern should validate according to the STIX spec.

thebleucheese commented 3 years ago

Thanks for the quick reply. I did take a deeper look into their schemas and noticed the error we're receiving on the attack-pattern doesn't make much sense based on the actual schema content and what's in enterprise-attack.json. It may be a combination of our validation library and their schema. I'll follow up with resolution.

thebleucheese commented 3 years ago

The schema validation library we're using is verbose and I misread the initial validation errors. Here are the actual validation issues:

For reference this is the url Regex used by STIX2.0 json schema:

^([a-zA-Z][a-zA-Z0-9+.-]*):(?:\\/\\/((?:(?=((?:[a-zA-Z0-9-._~!$&'()*+,;=:]|%[0-9a-fA-F]{2})*))(\\3)@)?(?=((?:\\[?(?:::[a-fA-F0-9]+(?::[a-fA-F0-9]+)?|(?:[a-fA-F0-9]+:)+(?::[a-fA-F0-9]+)+|(?:[a-fA-F0-9]+:)+(?::|(?:[a-fA-F0-9]+:?)*))\\]?)|(?:[a-zA-Z0-9-._~!$&'()*+,;=]|%[0-9a-fA-F]{2})*))\\5(?::(?=(\\d*))\\6)?)(\\/(?=((?:[a-zA-Z0-9-._~!$&'()*+,;=:@\\/]|%[0-9a-fA-F]{2})*))\\8)?|(\\/?(?!\\/)(?=((?:[a-zA-Z0-9-._~!$&'()*+,;=:@\\/]|%[0-9a-fA-F]{2})*))\\10)?)(?:\\?(?=((?:[a-zA-Z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9a-fA-F]{2})*))\\11)?(?:#(?=((?:[a-zA-Z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9a-fA-F]{2})*))\\12)?$

Url regex pattern mismatches in enterprise-attack.json: "https://www.virustotal.com/en/faq/ " - trailing space "http://www.harmj0y.net/blog/redteaming/a-guide-to-attacking-domain-trusts/ " - trailing space " https://labs.ft.com/2013/05/a-sobering-day/?mhq5j=e6 " - leading and trailing space

External reference ID issues: string [CAPEC-capec] does not match pattern ^CVE-\d{4}-(0\d{3}|[1-9]\d{3,})$

By removing the CAPEC external reference entries and trimming the URLs with leading and trailing spaces, I was able to validate the json successfully against the spec.

isaisabel commented 3 years ago

Trailing space issues are covered in #90. I'll need to think more about the CAPEC ID issue though.

isaisabel commented 3 years ago

Alright so there are some broken CAPEC IDs apparently. I wrote this script to detect the broken IDs:

from stix2 import TAXIICollectionSource, Filter
from taxii2client.v20 import Server, Collection

collection = Collection(f"https://cti-taxii.mitre.org/stix/collections/95ecc380-afe9-11e4-9b6c-751b66dd541e/")
tc_src = TAXIICollectionSource(collection)
result = tc_src.query([
    Filter("external_references.external_id", "in", ["CAPEC-capec", "CAPEC-CAPEC", "capec-CAPEC", "capec-capec"])
])
for obj in result:
    print(obj["id"], " | ", obj["external_references"][0]["external_id"], " | ", obj["name"])

Which outputs:

attack-pattern--fc742192-19e3-466c-9eb5-964a97b29490  |  T1574.004  |  Dylib Hijacking
attack-pattern--e64c62cf-9cd7-4a14-94ec-cdaac43ab44b  |  T1574.002  |  DLL Side-Loading
attack-pattern--58af3705-8740-4c68-9329-ec015a7013c2  |  T1574.008  |  Path Interception by Search Order Hijacking
attack-pattern--0c2d00da-7742-49e7-9928-4514e5075d32  |  T1574.007  |  Path Interception by PATH Environment Variable
attack-pattern--bf96a5a3-3bce-43b7-8597-88545984c07b  |  T1574.009  |  Path Interception by Unquoted Path
attack-pattern--17cc750b-e95b-4d7d-9dde-49e0de24148c  |  T1574.011  |  Services Registry Permissions Weakness
attack-pattern--9e8b28c9-35fe-48ac-a14d-e6cc032dcbcd  |  T1574.010  |  Services File Permissions Weakness
isaisabel commented 3 years ago

I'll ask the ATT&CK content team to fix this for the next release.

isaisabel commented 3 years ago

I've confirmed with the content team that the url whitespace and CAPEC ID issues will be resolved for the next release.

grimlock81 commented 3 years ago

Anyway, according to the STIX 2.0 spec, attack-pattern objects don't require the kill_chain_phases and description fields. Only name and type are required properties:

The removal of those additional fields is intended behavior on ATT&CK's side. I don't believe the STIX documentation for revocations requires that fields are removed: it states "This specification does not address how implementations should handle revoked data." So this is a choice on the part of the ATT&CK team.

A few questions

  1. What is the rationale for this choice? I'm finding it would have been very useful to have these fields for revoked attack-patterns in my case.
  2. Why do attack-patterns marked with "x_mitre_deprecated": true retain these fields? As neither revoked nor deprecated techniques appear in the Matrix any more this seems an inconsistent decision.

Attack-patterns

isaisabel commented 3 years ago

Hi @grimlock81,

What is the rationale for this choice?

Objects that are revoked have that content removed to encourage users to instead use their revoking (replacing) objects. A lot of users even now don't realize objects are revoked (or for that matter deprecated) and try to use them -- removing those fields makes it more likely they'll realize something's up and find the replacing object.

Revoked objects keep the following fields (although I'm not sure all of them occur in our dataset these days):

['name', 'labels', 'x_mitre_old_attack_id', 'type', 'modified', 'created', 'id', 'revoked']

I'm finding it would have been very useful to have these fields for revoked attack-patterns in my case.

Could you expand on this? We may be revisiting this behavior for a future version of ATT&CK and it'd be good to hear some user stories in favor of keeping those fields.

Why do attack-patterns marked with "x_mitre_deprecated": true retain these fields? As neither revoked nor deprecated techniques appear in the Matrix any more this seems an inconsistent decision.

Deprecated objects aren't replaced, just removed, so there's no "upgrade" or "replacement" available. Therefore we assume users will have a more complex workflow around stopping the usage of a deprecated object since they can't just remap their tooling to use the replacing object. So we keep the additional fields on deprecated objects for that reason.

grimlock81 commented 3 years ago

Could you expand on this? We may be revisiting this behavior for a future version of ATT&CK and it'd be good to hear some user stories in favor of keeping those fields.

In my application there may be data tagged with a Mitre technique that is now revoked. I would like to provide contextual information such as the description and the tactic(s) it was associated with. While future instances of the same data will be tagged with the replacement technique, users examining old data with the revoked techique would like to see this contextual information to understand why that particular technique was used at the time.