Hello, my name is Ben Steenhoek and I am a PhD student at Iowa State University studying deep learning-based vulnerability detection. Thank you for making this dataset available and easy to use.
I want to use your corpus of programs for the DARPA Cyber Grand Challenge to train a neural network model to detect buggy code, such as null-pointer dereferences or buffer overflows. To do this, I provide the model with the source code of the program and the location of the vulnerability. For example, if the vulnerability is a crash, I mark the statement which causes the crash, such as a segmentation fault caused by a null pointer dereference. In order to collect a large dataset of vulnerable programs, I can only use the vulnerability location if it's in a machine-readable format such as XML or CSV.
Since the cyber grand challenge evaluated several systems, I would expect there's some level of automated checking. However, I do not see a machine-readable index of vulnerable locations. This repo only includes a natural language description of each vulnerability in README.md. How can I access a machine-readable index of the vulnerability locations? I would be grateful for your help in making use of this wonderful dataset.
Hello, my name is Ben Steenhoek and I am a PhD student at Iowa State University studying deep learning-based vulnerability detection. Thank you for making this dataset available and easy to use.
I want to use your corpus of programs for the DARPA Cyber Grand Challenge to train a neural network model to detect buggy code, such as null-pointer dereferences or buffer overflows. To do this, I provide the model with the source code of the program and the location of the vulnerability. For example, if the vulnerability is a crash, I mark the statement which causes the crash, such as a segmentation fault caused by a null pointer dereference. In order to collect a large dataset of vulnerable programs, I can only use the vulnerability location if it's in a machine-readable format such as XML or CSV.
Since the cyber grand challenge evaluated several systems, I would expect there's some level of automated checking. However, I do not see a machine-readable index of vulnerable locations. This repo only includes a natural language description of each vulnerability in README.md. How can I access a machine-readable index of the vulnerability locations? I would be grateful for your help in making use of this wonderful dataset.