ossf / oss-compromises

Archive of various open source security compromises
5 stars 1 forks source link

Open Source Software Compromises Dataset

This is an effort to create a comprehensive dataset of open source software compromises. The intention is to help parties that want to prevent and mitigate open source software compromises.

All contributions are welcome. Initial effort will focus only on collecting data related to open source software compromises that happen after November 1, 2022. This is an experimental effort housed within the OpenSSF integrity working group

Inclusion Criteria, or What is an Open Source Software Compromise?

Compromises ought to be included in this dataset if both conditions (1) and (2) are met. “Compromise” implies that an attack has actually occurred.

Condition 1: The compromise arises from a vulnerability, introduced unintentionally or maliciously, in the open source software supply chain.

Condition 2: The compromise has a high impact. “High impact” means either “many” parties affected, especially parties associated with “critical infrastructure,” and/or the compromise results in “severe damage.”

Alternatively, vulnerabilities without an associated compromise ought to be included if the potential impact is vast and there is a high likelihood of undetected compromises, e.g. Heartbleed.

Who Is Responsible for Maintaining This Dataset?

This is a volunteer effort. There exist a set of maintainers that have personally volunteered their effort in the past to maintain separate, related datasets, and it’s therefore likely that this same group will also continue to devote time to maintaining this dataset. Others are welcome to contribute too. But it is a strictly volunteer effort.

What is the Recommended Timeline for Announcing Compromises?

This dataset is meant to capture only publicly reported compromises. So there must be publicly available information about the compromise. It is not intended to be a source of zero-days or otherwise undisclosed vulnerabilities.

How Do I Submit a Compromise?

Make a PR and place a new YAML file into the compromises folder. Each compromise is associated with one YAML file. This project uses a specific structure, described below, for data collection purposes. Fields marked as not required are optional. For an example YAML file, see the '1-dydx.yaml' file in the compromises folder.

Note on naming the file: Use id-name.yaml where id is an integer one greater than the highest existing id number and name is a short string somehow related to the attack.

Field Required Type Description
compromise-name Yes string A short, descriptive name for the attack. Err on the side of widely recognizable
description Yes string Provide a description of the attack. Several sentences will often be adequate.
compromise-classification Yes string (can use a sequence) Use attack class labels from the attack tree (https://arxiv.org/abs/2204.04008), in this paper, creating a separate label for each of the relevant nodes that apply to the attack, to the best of available knowledge. See attack-tree.md for a copy of the tree.
cwe No string If applicable, add the appropriate CWE. https://cwe.mitre.org/
mitre-attack No string If applicable, add the appropriate ATT&CK label. https://attack.mitre.org
ecosystem Yes string The open source ecosystem associated with the attack, e.g. PyPI.
date-earliest-evidence-of-compromise Yes string Appropriate formats include: YYYY-mm or YYYY-mm-dd
date-entry-was-created Yes string YYYY-mm or YYYY-mm-dd
references Yes string (can use a sequence) Any references, especially URLs, with information on attack.
malicious-intent Yes string "yes" or "no"
packages-affected Yes string list List of packages affected
IOCs No TBD Should list rule name, rule type, and rule specification.

Note: This is an experimental effort. When you detect conceptual or pragmatic problems with the data fields, please raise them in an issue. Revising the data fields is a likely outcome of this initial effort.