google / oss-fuzz

OSS-Fuzz - continuous fuzzing for open source software.
https://google.github.io/oss-fuzz
Apache License 2.0
10.43k stars 2.21k forks source link

Question: Reproducing a Crash in Earlier Version #2950

Closed buszk closed 4 years ago

buszk commented 5 years ago

I want to reproduce some bugs that were found by oss-fuzz and analyze them. However, I found it hard to reproduce old bugs in earlier versions. infra/helper.py build_image tries to pull the latest source of the targets. Thus, I could not reproduce any fixed bugs.

Although letting people analyze old bugs found in oss-fuzz is not the main purpose of this project, it's still a good idea to be able to reproduce and analyze already-fixed bugs.

Trying to reproduce everything by hand is time-consuming. I wonder if there are any good ideas for this problem. Maybe manually modify the Docker script to revert to the earlier version which still has the bug?

evverx commented 4 years ago

I think it's hard to automate especially if one of the goals is to figure out where bugs were initially introduced (which was discussed in the context of issuing CVEs in https://github.com/google/oss-fuzz/issues/1096#issuecomment-504726235).

Maybe manually modify the Docker script to revert to the earlier version which still has the bug?

This works most of the time especially if the projects you're interested in keep their fuzzers in their repositories. Another option would be to put the code in a directory and point build_fuzzers to it as described in https://google.github.io/oss-fuzz/advanced-topics/reproducing/#reproduce-using-local-source-checkout. If fuzzers are kept in the OSS-Fuzz repository or projects moved their fuzzing infrastructure outside somewhere along the way it's sometimes necessary to also revert ./projects/* to the point where the fuzzers can be built and run.

evverx commented 4 years ago

In theory, it would be much easier if in bug reports apart from links to the commits where bugs were first triggered, there were links to commits in the OSS-Fuzz repository and images used to trigger those bugs. This way it would be possible to automatically reproduce the bugs that have been fixed by building and running the fuzzers in almost the same environment. Another option would be to keep the images along with the fuzzers so that anyone could pull and run them without having to figure out how to build anything. Though, having said that, I'm not sure how often people need this or whether it's worth it.

inferno-chromium commented 4 years ago

This seems hard to do (depends on a particular project and how many dependencies they pull in), but @jonathanmetzman might have ideas.

jonathanmetzman commented 4 years ago

infra/helper.py build_image tries to pull the latest source of the targets. Thus, I could not reproduce any fixed bugs.

Right, I'm considering introducing an opt-in feature that would allow OSS-Fuzz to specify the commit to build a project on a specific commit. This would allow us to find the exact commits introduced and fixed a crash which would be useful for fixing bugs and for CVEish advisories.

Another option would be to keep the images along with the fuzzers so that anyone could pull and run them without having to figure out how to build anything. Though, having said that, I'm not sure how often people need this or whether it's worth it.

I think this would work but as you point out would probably not be worth the cost.

The best solution I can think of for the time being will break on many projects, but I suspect it will work on enough of them to be useful. Most projects specify the directory containing their repo using WORKDIR in docker, which makes the repo the current working directory when infra/helper.py shell <project> is used. So the solution is extending the helper script to run git checkout <commit> before running build.sh where <commit> is provided as an argument. Of course, in many cases this breaks, particularly in complex projects where dependencies are cloned in the docker file. Did you try something like this?

I do something like this in bisect_clang.py, but the goal there is finding bad clang commits not bad project commits.

evverx commented 4 years ago

So the solution is extending the helper script to run git checkout before running build.sh where is provided as an argument. Of course, in many cases this breaks, particularly in complex projects where dependencies are cloned in the docker file. Did you try something like this?

I think it should make it easier to reproduce bugs in upstream projects but more often than not I also apply patches after git checkout (to somewhat emulate what usually ships downstream) so I usually put the code I'm trying to analyze in a directory and point build_fuzzers to it. Then again, it looks like my use case is a bit different in the sense that I'm for the most part interested in figuring out whether bugs are present in downstream packages.

buszk commented 4 years ago

Thank @jonathanmetzman and @evverx .

Currently, I'm doing quite a few things in order to reproduce a bug found by oss-fuzz.

In general, targets that use git clone is much simpler to automate the reproduction. Having infra/helper.py build_fuzzers and build.sh accept some commit hash can make it easy to automate. And the commit hashes of the target and dependencies could be stored when oss-fuzz issues a bug. In this way, it is very likely to automate the reproduction of earlier bugs.

evverx commented 4 years ago

I'm wondering if it would make sense to keep the issue open. It's not that it's gotten any easier to reproduce old bugs :-)

buszk commented 4 years ago

IMO, making OSS-fuzz bugs reproducible would be very useful. It can be used for vulnerability analysis, protection mechanism evaluation. OSS-fuzz can provide a real-world dataset of reproducible bugs of various categories. I would love to contribute if you are interested.

jonathanmetzman commented 4 years ago

IMO, making OSS-fuzz bugs reproducible would be very useful. It can be used for vulnerability analysis, protection mechanism evaluation. OSS-fuzz can provide a real-world dataset of reproducible bugs of various categories. I would love to contribute if you are interested.

@Leo-Neat is actually working on this now, though we probably won't go with his first implementation (#3058) since it seems like not a great fit for bisecting which is a major use case for reproducing old crashes (though we may use it in the end if we find it more useful). I'll update this post when the feature is implemented. After that, I think you can contribute by using it and helping iron out any edge cases.

buszk commented 4 years ago

@jonathanmetzman I think there is some difference in the application of dissecting the buggy commit and reproducing the bug. To find the buggy commit, we are usually dealing with the lastest code, so upstream API presumably stays the same. To reproduce an old bug, we only care if the bug is inside the build, and we have the input that triggers the bug. This intention also results in attempts to reproduce very old bugs in which library API may change.

Reproducing old bugs is useful for me because I want to test if some compiler sanitizers work on certain bugs. I can use the oss-fuzz bug database to get the false-positive evaluation. I am sure there are more use cases.

I am putting up a repository of scripts that can help me reproduce some of the oss-fuzz bugs. (https://github.com/buszk/oss-fuzz-reproduce)

I think the most important piece of information is the commit SHA of the project source code and every library, and sometimes the commit SHA of oss-fuzz repository.

With all this information, I can easily reproduce the bugs. It is much better if oss-fuzz can provide all this information when a bug is found, so bug reproduction can be readily automated. Would you like some features like that to be added?

MNayer commented 3 years ago

@buszk Did you try to reproduce every crash OSS-Fuzz found? Do you remember how that turned out and how many crashes could be reproduced successfully?

cfossace commented 1 year ago

+1 to @MNayer's question to @buszk --- trying to repro a bug :)

buszk commented 1 year ago

Because OSS-Fuzz was designed to support continuous fuzzing. Backward reproducibility is not a well supported feature. Following the above steps did give a good portion of reproducible crashes.

evverx commented 1 year ago

I think sometimes it's also necessary to either downgrade build toolchains or apply patches making projects compile with the latest toolchains. That would be another reason why bugs can't be bisected automatically apart from, say, https://github.com/google/osv.dev/issues/918.

Following the above steps did give a good portion of reproducible crashes.

@buszk I wonder if this was part of some research project? It would be great if you could share a link to the paper.

jonathanmetzman commented 1 year ago

I think we do downgrade the toolchain. There are other challenges as well. What about dependencies that are unpinned (i think we are not even properly reverting to the pinned commit of dependencies anyway).

jonathanmetzman commented 1 year ago

It seems like repos also rewrite their history not so infrequently as well

evverx commented 1 year ago

@jonathanmetzman I agree it's hard to bisect issues automatically at this scale and I think the OSV/OSS-Fuzz bisector is as good as it can possibly be. I think what I was trying to say is that all in all the exact steps required to build specific projects at random points in time depend on the project and there is no way to fully automate all that stuff away. Though https://github.com/google/oss-fuzz/issues/2950#issuecomment-1430691062 should probably cover a lot of cases.

jonathanmetzman commented 1 year ago

Sorry youre right, my comment was basically redundant

cfossace commented 1 year ago

Thanks everyone this is very helpful. I think the build part is what is difficult. @buszk Have you been able to build it with Google's OSS-Fuzz scripts?

example: https://google.github.io/oss-fuzz/advanced-topics/reproducing/#reproducing-bugs python infra/helper.py build_image libxml2

How would I build this for a specific commit? Same with the build_fuzzer script. I don't see an easy way to configure the container to pull down a specific commit since it seems like it tries to pull down the latest version every time you build and doesn't let you specify a commit hash. Also kind of new to docker so sorry if this is a naive question that is easily solved.