google / oss-fuzz

OSS-Fuzz - continuous fuzzing for open source software.
https://google.github.io/oss-fuzz
Apache License 2.0
10.12k stars 2.15k forks source link

Broken introspector report (`pandas` project) #11143

Open nmlsg opened 8 months ago

nmlsg commented 8 months ago

Hi there!

I'm currently working on expanding fuzzing coverage for the pandas project, and I've come across a few observations in the pandas introspector report:

I was wondering if you could provide some insights or suggestions on how to address this issue.

Thank you in advance!

@DavidKorczynski

nmlsg commented 8 months ago

Hi @DavidKorczynski,

I observed that the pandas project was built with errors and has been removed from introspector.oss-fuzz.com.

Please let me know if you have any updates regarding this issue.

Thank you in advance!

DavidKorczynski commented 8 months ago

I'm looking into this @nmlsg -- I have a build fix but the introspector was a bit more tricky. I should have something soon.

nmlsg commented 8 months ago

Thank you @DavidKorczynski , hope to hear positive news from you soon !

DavidKorczynski commented 8 months ago

Am re-opening this while we wait and see the effect of the PR in the cloud.

nmlsg commented 8 months ago

Hi, @DavidKorczynski !

Thank you ! I'm waiting for the next reports!

nmlsg commented 8 months ago

Morning @DavidKorczynski Last introspector build took too much time and was failed. log_link

** Do you have any suggestion how can I get 'optimal strategy' for the project locally? I need it for my further investigation and fuzz targets development

Thank you!

DavidKorczynski commented 8 months ago

This will build the fuzzers using ASAN; collect a corpus; generate code coverage reports; run an introspector run:

python3 ./infra/helper.py introspector pandas --seconds=10
nmlsg commented 8 months ago

Thank you for your suggestion, it took too long time ( > 8 hrs) and failed. Otherwise, is it a way to generate only optimal strategy for python library? I cannot find any "deep" functions which fuzzing will result in higher coverage.I have developed more than 50 fuzz targets (using radon to get 'deep' functions(with F score)) but I got only small delta (~2% to the current coverage)

I will be really appreciate for your suggestion what functions should I cover or how to get optimal strategy. Thank you for your incredible work!

@DavidKorczynski

DavidKorczynski commented 8 months ago

Thank you for your suggestion, it took too long time ( > 8 hrs) and failed.

I started a run around ~14 hours ago and it's still running for me @nmlsg -- How did yours fail?

I'll take a look at Pandas tomorrow and come back with some more information on what are some good potential targets.

nmlsg commented 8 months ago

@DavidKorczynski /usr/local/bin/compile_python_fuzzer: line 55: 211 Killed
python3 /fuzz-introspector/frontends/python/main.py $ARGS ERROR:main:Building fuzzers failed. ERROR:main:Failed to build project with introspector

Waiting for your suggestion about potential good fuzz targets, thank you!

nmlsg commented 8 months ago

Hi @DavidKorczynski Any updates regarding your introspector run? Did you have a chance to look into 'deep' function to fuzz?

Thank you!!

DavidKorczynski commented 8 months ago

Any updates regarding your introspector run?

It got killed after ~18 hours or so. Perhaps one thing we can do is limit the code to analyse to a subset of pandas -- it seems to be a huge project.

Did you have a chance to look into 'deep' function to fuzz?

Didn't have time yet @nmlsg -- it's on my list, but it'll probably take some time. That said, did you give this a try yourself (without radon, but from manually auditing the code)?

nmlsg commented 8 months ago

It got killed after ~18 hours or so. Perhaps one thing we can do is limit the code to analyse to a subset of pandas -- it seems to be a huge project.

Hi @DavidKorczynski thank you for the update.

Didn't have time yet @nmlsg -- it's on my list, but it'll probably take some time. That said, did you give this a try yourself (without radon, but from manually auditing the code)?

Definitely, I have tried several strategies (including manual source code review) and identified promising fuzz targets; however, the overall change in the coverage report is too small (it increased from 24% to approximately 26-27%). To determine overall coverage, I used 'infra/helper.py coverage'. I designed and developed my own script, which bases its operation on finding the length of AST nodes and identifying intersections with covered functions. This approach has significantly aided me in generating ideas for fuzz targets.

Moreover, while running fuzz targets locally (e.g., python3 fuzz_target.py), I achieved normal coverage (e.g., cov: 1671 ft: 3995) after approximately 300,000 runs. However, in the logs of the OSS-Fuzz coverage run, I noticed that the target was executed only twice and resulted with (cov 299 ft: 299 .

^^ Is there a way to increase count of runs during the coverage ('infra/helper.py coverage') without relying extensively on local corpora? ( due to unavailability of using introspector)

nmlsg commented 8 months ago

Hi again @DavidKorczynski, Could you please provide a timeline for fixing the 'introspector' issue in pandas?