Open DavidKorczynski opened 11 months ago
Do you think this functionality could be extended to an optional part of the API? i.e. it would be great to programmatically fetch from github. This could be done by just exposing the following in the all-functions endpoint;
Then you can create a link like the following;
- https://github.com/ossf/fuzz-introspector/blob/660d8bf5a14f010fc471f49591b09772d5fb344d/src/main.py#L29
Having it as part of the API would also mean you can do other things like download the file using the github API. The only downside is that not all projects use git (so the commit field might need to be something more generic or optional).
If I understand your thought correctly then I think it would be neat -- namely to have an API available that'll provide you a link to the source code, or, perhaps the source code of each harness.
However, I'm unsure what you meant by commit? Which commits are you referring to for each harness?
My thoughts are:
Take the llhttp
project with a single fuzzer: https://introspector.oss-fuzz.com/project-profile?project=llhttp -- in this case, I'd like for the profile page to have a "fuzzers table" with 1 row (because there's a single fuzzer) with a link to https://github.com/nodejs/llhttp/blob/main/test/fuzzers/fuzz_parser.c#L8-L45
for the fuzz_parser
harness.
To make it an API, I would make either the above URL accessible. We should be able to provide references to other repo websites e.g. gitlab and more, and, in the worst case we can provide a URL to the code coverage report for where the fuzzer is as we'll always (or when coverage is working at least) have a link to the code coverage reports.
Are you perhaps thinking instead of https://github.com/nodejs/llhttp/blob/main/test/fuzzers/fuzz_parser.c#L8-L45
the right link to provide is https://github.com/nodejs/llhttp/blob/8498ef9d8b0e9539c8c331cf59213529287789e1/test/fuzzers/fuzz_parser.c#L8-L45
?
We may run into some issue with having to predict branch names. Hmm, I'm not sure if there are many edge cases we'll have to handle.
One option is to reduce this to links to the location in the code coverage reports. I think that's also useful in and of itself, but, I also think having the source repo URLs provide high value, and even the source code itself.
Are you perhaps thinking instead of https://github.com/nodejs/llhttp/blob/main/test/fuzzers/fuzz_parser.c#L8-L45 the right link to provide is https://github.com/nodejs/llhttp/blob/8498ef9d8b0e9539c8c331cf59213529287789e1/test/fuzzers/fuzz_parser.c#L8-L45?
Yeah that's pretty close to what I was saying, I guess what I was getting at is that you need 3 bits of information to reproducibly find a function;
$ cd llhttp # Insert project here
$ git rev-parse HEAD
8498ef9d8b0e9539c8c331cf59213529287789e1
test/fuzzers/fuzz_parser.c
L8-L45
If you stitch all of these peices together you can reproducibly find the specific function again, and it will always remain the same e.g.
https://github.com/nodejs/llhttp/blob/8498ef9d8b0e9539c8c331cf59213529287789e1/test/fuzzers/fuzz_parser.c#L8-L45
https://github.com/nodejs/llhttp/blob/{---------------commit-sha-------------}/{-----------path----------}#{range}
My suggestion is to just include those peices of information in the API, and leave the URL building up to the user. For example the same thing would be reproducible from the command line. e.g.
$ git clone https://github.com/nodejs/llhttp.git
$ cd llhttp
$ git checkout 8498ef9d8b0e9539c8c331cf59213529287789e1
$ # Snip out lines 8-45
$ sed -n '8,45 p' test/fuzzers/fuzz_parser.c
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
// Truncated ...
}
The latter being closer to what I would likely be doing.
We may run into some issue with having to predict branch names. Hmm, I'm not sure if there are many edge cases we'll have to handle.
I don't think branch prediction would be an issue with the above approach. A branch in itself is just a stream of sequential commits. Whereas a commit itself is an atomic representation of a git repository. So as long as you collect the git sha, when you run introspector you should be able to reproducibly restore that commit (or use the github api, to view the file at that commit).
That's assuming you meant git branch and not some other definition of a branch :)
Also worth noting that gitlab, bitbucket and others have a similiar api structure available as well e.g.
https://gitlab.com/gnuwget/wget2/-/blob/8271687e29568e9a271afa1b3112325611f48183/fuzz/libwget_atom_url_fuzzer.c#L31-53
https://bitbucket.org/snakeyaml/snakeyaml/src/a4df9e7d7ffdc0c21fe268f872a1e30d03aa8f02/src/main/java9/module-info.java#lines-14:44
It would be nice to have direct links to the fuzzer source files on the profile pages -- I think some heuristics will be able to do this and it will make it very convenient to browse a given project's structure.