ocsf / ocsf-schema

OCSF Schema
Apache License 2.0
582 stars 118 forks source link

Improve and fix enum declarations #1111

Closed mlmitch closed 6 days ago

mlmitch commented 3 weeks ago

Changes were tested on a local OCSF server and are error free.

Description of changes:

classification_ids now declared in dictionary with default "0" and "99" values and descriptions "0" and "99" already existed without description through override in its only use (objects/malware.json)

data_lifecycle_state_id mistakenly had the "Other" description applied to "0"/"Unknown" A "99"/"Other" value was missing

flag_ids now declared in dictionary with default "0" and "99" values and descriptions "0" and "99" already existed without description through override in its only use (objects/dns_answer.json)

integrity mistakenly referenced the direction_id enum in its description

integrity_id now declared in dictionary with default "0" and "99" values and descriptions "0" and "99" already existed without description through override in its only use (objects/process.json)

load_type_id now declared in dictionary with default "0" and "99" values and decsriptions "0" and "99" already existed without description through override in its only use (objects/module.json)

opcode_id now points to RFC5395 to explain why "0" does not correspond to "Unknown" A "99"/"Unmapped" value is added to match other RFC based enums

protocol_ver_id now declared in dictionary with default "0" and "99" values and descriptions "0" and "99" already existed without description through override in its only use (objects/network_connection_info.json)

run_state_id now declared in dictionary with default "0" and "99" values and descriptions "0" and "99" already existed without description through override in its only use (objects/job.json)

mlmitch commented 3 weeks ago

I wasn't sure what to do with risk_level_id, risk_level_id does not adhere to the enum convention with "0" being "Info" and no "Other" being defined in the dictionary.json, though "Other" values may exist in specific contexts. A non-disruptive change could be to add "99"/"Other" to the dictionary and live with the inappropriate use of "0"

mlmitch commented 3 weeks ago

I talked over risk_level_id with @davemcatcisco a little.

risk_level description is fixed to reflect the absence of "99"/"Other" in risk_level_id. risk_level_id does not have a "99" in any of its usages and "0" does not correspond to "Unknown". Therefore the situation did not justify adding "99"/"Other"

pagbabian-splunk commented 1 week ago

I have a minor suggestion: for the enums with only the nominal values (0/99 or 99) we should include the 'See specific usage` phrase in the descriptions, which the compiler will enforce to make sure a class or object defines the specific values. This usually is done for more generic attributes that can be overridden per class, but for specific attribute names, even though they are targeted for a single usage, the actual values must be defined in the class or object if they aren't defined in the dictionary.

mlmitch commented 1 week ago

Adding in the "See specific usage" phrase cracked open a new can of worms. To start, the server produces a bunch of warnings:

2024-06-26 00:46:58.789 [warning] mfa=Schema.Cache.missing_desc_warning/4 line=1150  Please update the description for module.load_type_id: The normalized identifier of the load type. It identifies how the module was loaded in memory. See specific usage.
2024-06-26 00:46:58.791 [warning] mfa=Schema.Cache.missing_desc_warning/4 line=1150  Please update the description for malware.classification_ids: The list of normalized identifiers of the malware classifications. See specific usage. Reference: <a target='_blank' href='https://docs.oasis-open.org/cti/stix/v2.1/os/stix-v2.1-os.html#_oxlc4df65spl'>STIX Malware Types</a>
2024-06-26 00:46:58.792 [warning] mfa=Schema.Cache.missing_desc_warning/4 line=1150  Please update the description for process.integrity_id: The normalized identifier of the process integrity level (Windows only). See specific usage.
2024-06-26 00:46:58.793 [warning] mfa=Schema.Cache.missing_desc_warning/4 line=1150  Please update the description for affected_package.type_id: The normalized type identifier of an object. See specific usage.
2024-06-26 00:46:58.793 [warning] mfa=Schema.Cache.missing_desc_warning/4 line=1150  Please update the description for affected_package.vendor_name: The name of the vendor. See specific usage.
2024-06-26 00:46:58.793 [warning] mfa=Schema.Cache.missing_desc_warning/4 line=1150  Please update the description for package.type_id: The normalized type identifier of an object. See specific usage.
2024-06-26 00:46:58.793 [warning] mfa=Schema.Cache.missing_desc_warning/4 line=1150  Please update the description for package.vendor_name: The name of the vendor. See specific usage.

Some of these warnings existed previously and have nothing to do with this PR (type_id, vendor_name). @pagbabian-splunk this is probably what you were thinking, and these things should be fixed too. However, I think properly fixing these involves evaluating if a given enum is:

  1. a very specific enum name, and the definitions should be pulled into dictionary.json
  2. a general enum name, and the definitions should be pushed into their specific use case.

I started down this path and I suspect some of them will need conversation / debate. The actual diff is probably about as big as this PR was originally.

I suggest that we hold off on the "See specific usage" change, and I'll start an additional PR to address those issues.