trailofbits / cb-multios

DARPA Challenges Sets for Linux, Windows, and macOS
https://blog.trailofbits.com/2016/08/01/your-tool-works-better-than-mine-prove-it/
MIT License
520 stars 103 forks source link

Vulnerability locations index #94

Open bstee615 opened 2 years ago

bstee615 commented 2 years ago

Hello, my name is Ben Steenhoek and I am a PhD student at Iowa State University studying deep learning-based vulnerability detection. Thank you for making this dataset available and easy to use.

I want to use your corpus of programs for the DARPA Cyber Grand Challenge to train a neural network model to detect buggy code, such as null-pointer dereferences or buffer overflows. To do this, I provide the model with the source code of the program and the location of the vulnerability. For example, if the vulnerability is a crash, I mark the statement which causes the crash, such as a segmentation fault caused by a null pointer dereference. In order to collect a large dataset of vulnerable programs, I can only use the vulnerability location if it's in a machine-readable format such as XML or CSV.

Since the cyber grand challenge evaluated several systems, I would expect there's some level of automated checking. However, I do not see a machine-readable index of vulnerable locations. This repo only includes a natural language description of each vulnerability in README.md. How can I access a machine-readable index of the vulnerability locations? I would be grateful for your help in making use of this wonderful dataset.