matanolabs / matano

Open source security data lake for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS
https://matano.dev
Apache License 2.0
1.42k stars 98 forks source link

Adds CISA Known Exploited Vulnerabilities as a managed enrichment table #162

Closed rileydakota closed 1 year ago

rileydakota commented 1 year ago

This PR adds a managed enrichment table for CISA's Known Exploited Vulnerabilities List. The primary use case of this list is for helping organizations prioritize vulnerabilities by having a list of known CVEs being actively exploited in the wild. Specific to Matano, this would be used to either perform realtime enrichment on a vulnerability scanner source (Eg - AWS Inspector doesn't currently include thus, or other tools like grype, nessus, other vuln scanners/software composition analysis tools, etc) that said finding is present on the KEV list, or query by joining on the enrich_cisa_kev table to get a list of active CVEs present on the KEV list (or even something more specific, like all EC2 Instances that have a known exploited vulnerability ).

Very open to suggestions on transforming the actual schema if needed. I also ran into some minor issues with the Rust CSV library not properly recognizing the headers when using set_headers to directly specify the header names. I didn't see any obvious typos with the headers from the file present at https://www.cisa.gov/sites/default/files/csv/known_exploited_vulnerabilities.csv

TLDR: Managed Enrichment Source for CISA KEV Runs daily, unauthenticated fetch of a CSV file Using the same schema as provided in the CSV file (unless proposed otherwise)

# cisa_kev/enrichment.yml

name: "cisa_kev"

managed:
  type: "cisa_kev"
Screenshot 2023-06-24 at 1 58 42 PM
Samrose-Ahmed commented 1 year ago

This is excellent.

I'm thinking we should normalize these into ECS fields. We don't always with enrichment tables, but it can be useful with something like vulnerabilities because it gives us the opportunity to do auto joins.

Here are the ECS vulnerability fields;: https://www.elastic.co/guide/en/ecs/current/ecs-vulnerability.html, should map prett simply.

We can put the rest of the fields into a group like cisa.kev .

Also for the write mode (overwrite vs merge), do you know how the data is published. Is the full dataset always include? Otherwise we could think of modeling it as a merge on an ID, so old rows are not overwritten.

Let me know what you think.

rileydakota commented 1 year ago

Awesome @Samrose-Ahmed!

Re: Normalization, totally! I'll take a stab at this tonight. Will give me an excuse to get more comfortable with VRL anyway. I will assume that we will be okay with some fields not mapping exactly.

Re: Write Mode, correct - it's the full dataset. I figured overwrite would be best since there isn't really any reason to have old versions of the list (A CVE is either on the list or it isnt), but open to changing if it makes sense.

rileydakota commented 1 year ago

@Samrose-Ahmed take a peak at the enrichment.yml - I mapped to the ECS vulnerability fields where it made sense, and nested the other fields under a cisa_kev key to match the other enrichment tables.

Let me know your thoughts!

Samrose-Ahmed commented 1 year ago

LGTM!