This is an effort to create a comprehensive dataset of open source software compromises. The intention is to help parties that want to prevent and mitigate open source software compromises.
All contributions are welcome. Initial effort will focus only on collecting data related to open source software compromises that happen after November 1, 2022. This is an experimental effort housed within the OpenSSF integrity working group
Compromises ought to be included in this dataset if both conditions (1) and (2) are met. “Compromise” implies that an attack has actually occurred.
Condition 1: The compromise arises from a vulnerability, introduced unintentionally or maliciously, in the open source software supply chain.
Condition 2: The compromise has a high impact. “High impact” means either “many” parties affected, especially parties associated with “critical infrastructure,” and/or the compromise results in “severe damage.”
Alternatively, vulnerabilities without an associated compromise ought to be included if the potential impact is vast and there is a high likelihood of undetected compromises, e.g. Heartbleed.
This is a volunteer effort. There exist a set of maintainers that have personally volunteered their effort in the past to maintain separate, related datasets, and it’s therefore likely that this same group will also continue to devote time to maintaining this dataset. Others are welcome to contribute too. But it is a strictly volunteer effort.
This dataset is meant to capture only publicly reported compromises. So there must be publicly available information about the compromise. It is not intended to be a source of zero-days or otherwise undisclosed vulnerabilities.
Make a PR and place a new YAML file into the compromises folder. Each compromise is associated with one YAML file. This project uses a specific structure, described below, for data collection purposes. Fields marked as not required are optional. For an example YAML file, see the '1-dydx.yaml' file in the compromises
folder.
Note on naming the file: Use id-name.yaml
where id
is an integer one greater than the highest existing id
number and name
is a short string somehow related to the attack.
Field | Required | Type | Description |
---|---|---|---|
compromise-name | Yes | string | A short, descriptive name for the attack. Err on the side of widely recognizable |
description | Yes | string | Provide a description of the attack. Several sentences will often be adequate. |
compromise-classification | Yes | string (can use a sequence) | Use attack class labels from the attack tree (https://arxiv.org/abs/2204.04008), in this paper, creating a separate label for each of the relevant nodes that apply to the attack, to the best of available knowledge. See attack-tree.md for a copy of the tree. |
cwe | No | string | If applicable, add the appropriate CWE. https://cwe.mitre.org/ |
mitre-attack | No | string | If applicable, add the appropriate ATT&CK label. https://attack.mitre.org |
ecosystem | Yes | string | The open source ecosystem associated with the attack, e.g. PyPI. |
date-earliest-evidence-of-compromise | Yes | string | Appropriate formats include: YYYY-mm or YYYY-mm-dd |
date-entry-was-created | Yes | string | YYYY-mm or YYYY-mm-dd |
references | Yes | string (can use a sequence) | Any references, especially URLs, with information on attack. |
malicious-intent | Yes | string | "yes" or "no" |
packages-affected | Yes | string list | List of packages affected |
IOCs | No | TBD | Should list rule name, rule type, and rule specification. |
Note: This is an experimental effort. When you detect conceptual or pragmatic problems with the data fields, please raise them in an issue. Revising the data fields is a likely outcome of this initial effort.