gsalzer / cgt

Consolidated Ground Truth (CGT) for Weaknesses of Ethereum Smart Contracts
MIT License
16 stars 1 forks source link

Consolidated Ground Truth (CGT) for Weaknesses of Ethereum Smart Contracts

This repository contains a unified and consolidated ground truth that we constructed from previously published benchmark sets that were manually classified by the respective authors. We completed the data, eliminated inconsistencies and duplicates, and checked discrepancies between the datasets. The ground truth consists of a collection of contracts (in general given by chain address, source code, bytecode, and runtime code) with a manually verified result whether the contract exemplifies a specific weakness.

See also our accompanying paper Consolidation of Ground Truth Sets for Weakness Detection in Smart Contracts.

Responsible disclosure

This repo contains a copy of weakness collections published at the indicated locations well before the end of 2022. We don't expect any of these to contain vulnarabilities that can (still) be exploited. However, if you find anything worth reporting, please do so asap.

Further readings:

Datasets integrated in CGT

CodeSmells
ContractFuzzer
Doublade
eThor
EthRacer
Ever Evolving Game
JiuZhou
Not So Smart Contracts
NPChecker
SmartBugs curated
SolidiFI
SWC registry
Zeus

The consolidated data

The result of our efforts is the file consolidated.csv, with the artefacts collected in the folders source, bytecode, and runtime. For details of its construction, see the README and the scripts in the folder construction.

The file consolidated.csv contains one line per consolidated assessment, with the values separated by semicolons. Each line consists of the following fields. The fields dataset, id, property, and property_holds are never empty, while the other fields may be.

License

The folder construction/originalSets contains copies of publicly available datasets (see construction/README.md for links to the sources). The data therein retains the licenses of the original datasets.

The folders source, construction/scripts/cache, and construction/unified contain smart contracts in the form of Solidity code that to a large extent have been obtained either directly or indirectly (via the original sets) from etherscan.io. They retain the license specified there.

The bytecode in the folders bytecode, runtime, construction/scripts/cache, and construction/unified has mostly been obtained from a public blockchain, or has been generated by the Solidity compiler. No idea whether any license applies; none is imposed by this repository.

For the Python and SQL code, the MIT License applies.