Open nathaniel-brough opened 11 months ago
the summary.json
is used various places and intentional in that sense -- but I assume the fewest will be using it directly. You're more than welcome to as such, but I would not expect this to be stable. At the moment it's not in the plans to make that logic super stable or track changes.
If I was to build an API client around that would it be stable? Or would it be better to formally include this in the public API?
The preferred, and much appreciated, way would be to submit patches to the webapp and provide APIs that expose this data. We'd also be happy to keep this stable. Basically, web_db_creator_from_summary converts the summary.json
to something that we use in the webapp, by way of the models which are loaded here
That said, if this is specifically related to It would be great if we could analyse each fuzzer relative to the complete project statistics.
then this might be a bit too specific wrt exposing in an API/the UI.
Could you elaborate a bit more what you have in mind here? Is it something we could add to the fuzzers overview table as well?
Do you have in mind stuff such as checking intersections/specific differences of reachability/coverage between fuzzers? In the All functions
table of the main introspector reports there are columns such as "Reached by fuzzers", and we could add a "Runtime covered by fuzzers" -- that should make it possible to deeper analysis in terms of "which fuzzers targets what part of the code". We could expose this in the API too perhaps?
For reference, the oss-fuzz-scanner goes through the cloud storage and interprets the summary.json
files. It basically allows you to handle everything from the raw summary.json
objects.
I'd probably prefer to focus on the API, and the API kind of arose from that project. However, for users that want to come up with interesting/arbitrary analysis we may simply end up with one specific API that offers "everything available" and then this just does the job of assembling all introspector data from all projects in one accessible point. I guess the downside is that we're not really against adding arbitrarily many APIs (within reason) so adding APIs that may be of value to others is nice -- but, having all data available makes it quiet nice to experiment and "explore" possibilities. Hmm.
Would this fit your visions?
The preferred, and much appreciated, way would be to submit patches to the webapp and provide APIs that expose this data. We'd also be happy to keep this stable. Basically, web_db_creator_from_summary converts the summary.json to something that we use in the webapp, by way of the models which are loaded here
Easy, I'll try and find the time over the next week or two for that.
That said, if this is specifically related to It would be great if we could analyse each fuzzer relative to the complete project statistics. then this might be a bit too specific wrt exposing in an API/the UI. Could you elaborate a bit more what you have in mind here? Is it something we could add to the fuzzers overview table as well?
So for some context I often participate in the google/bughunters oss-fuzz rewards program, for a bit of extra-cash on the side. I think you've reviewed a number of my PR's over at google/oss-fuzz.
I'm trying to optimize my workflow a little bit and create somewhat of a hybrid between;
So being able to collect those metrics programmatically would allow me to ask broader questions like.
So yeah some of these changes would likely fit nicely into UI.
The only other thing that I'd add is that I could see myself calling the API asynchronously for each project in oss-fuzz. So having some guidelines on rate-limiting would be great.
Do you have in mind stuff such as checking intersections/specific differences of reachability/coverage between fuzzers? In the All functions table of the main introspector reports there are columns such as "Reached by fuzzers", and we could add a "Runtime covered by fuzzers" -- that should make it possible to deeper analysis in terms of "which fuzzers targets what part of the code". We could expose this in the API too perhaps?
Honestly haven't thought too much about this. But it does spark up some other thoughts. I have been interested in working out a way to automatically identify the "purpose" of a fuzz-harness. e.g. which top-level function in the fuzz-harness has the most reachable complexity. But I haven't fully fleshed this thought out yet.
For reference, the oss-fuzz-scanner goes through the cloud storage and interprets the summary.json files. It basically allows you to handle everything from the raw summary.json objects.
Cool I'll have a look into that when I get to extending the project-summary endpoint.
I'd probably prefer to focus on the API, and the API kind of arose from that project. However, for users that want to come up with interesting/arbitrary analysis we may simply end up with one specific API that offers "everything available" and then this just does the job of assembling all introspector data from all projects in one accessible point. I guess the downside is that we're not really against adding arbitrarily many APIs (within reason) so adding APIs that may be of value to others is nice -- but, having all data available makes it quiet nice to experiment and "explore" possibilities. Hmm.
Would this fit your visions?
Yeah for sure, having all the raw data available would be great for experimenting with things, and probably my ideal situation. I also think there is a decent amount of derivative data that would also be super useful e.g. optimal target analysis.
Perhaps it would make sense to do something like a daily datadump rather than a complete API? Or even storing all the data as a public bigquery dataset. The latter would be easier to experiment/play with the data.
Not sure which would be a better approach, but I could see myself hammering the entire API surface once a day for each project (within rate-limits).
I'm trying to optimize my workflow a little bit and create somewhat of a hybrid between;
The target-oracle. Optimal target-analysis. Criticality score So being able to collect those metrics programmatically would allow me to ask broader questions like.
Out of all the c++ projects currently integrated into OSS-fuzz, which function would be the most optimal target functions to fuzz right now?
Got it -- the target oracle was meant to this kind of use case (i.e. help finding the optimal spot between value gained and effort needed amongst all OSS-Fuzz projects) so great it sparked your interest!
I think I have a good understanding of what you're looking for and will push for this in the coming days, and assess what options we have for easily making all data available.
Also yeah definitely saw your contributions on OSS-Fuzz, great work!
At the moment the project-summary endpoint provides a per fuzz-harness summary rather than a project wide summary. i.e. it provides reachable cyclomatic complexity/line coverage per fuzz-harness. It would be great if we could analyse each fuzzer relative to the complete project statistics. Essentially being able to reconstruct something like this using the API would be great.
I've been perusing the s3 bucket that stores all the introspector data, and I think that the data is already available here. Forming a sort of undocumented and (probably) unintentional API. Note that the fields for cyclomatic complexity match the screenshot.
If I was to build an API client around that would it be stable? Or would it be better to formally include this in the public API?