time-rs / time

The most used Rust library for date and time handling.
https://time-rs.github.io
Apache License 2.0
1.09k stars 273 forks source link

Integration with google/oss-fuzz for continuous fuzz testing #549

Closed silvergasp closed 1 year ago

silvergasp commented 1 year ago

First of all thanks for all your excellent work here :) Google's OSS-fuzz is a free continuous fuzzing service. If you are interested in integrating the time crate with OSS-fuzz, the OSS-fuzz project will do the following each day;

I'd be happy to champion the effort for integration as well as write some fuzz tests if you are keen on it?

I've noticed that you have a set of quickcheck tests. Many quick check tests are going to be nearly drop-in compatible with fuzz testing. The advantage of using libfuzzer over quickcheck is that libfuzzer makes coverage-driven improvements (greybox fuzzing) to the testing suite. Whereas quickcheck is a black box fuzzer and doesn't improve code-coverage in a feedback loop.

Black box fuzzing (quickcheck) This approach to fuzzing is considered to be open-loop and does not use feedback to improve future results.

flowchart LR
    A[Random Input] --> B[Fuzz Harness]
    B --> A

With black-box fuzzing the probability of finding a bug does not significantly change over time.

Grey box fuzzing (cargo-fuzz/libfuzzer) This approach to fuzzing is considered to be closed-loop and does use feedback to improve future results.

flowchart LR
    A[Random Input] --> B[Fuzz Harness]
    B --> C[Collect code coverage information]
    C -->  D[Mutate orgional input to maximize coverage] 
    D --> B

Context

I recently integrated gitoxide into google/oss-fuzz. One of the crates in gitoxide (git-date) uses this crate extensively. So we'd like to improve stability upstream as well through fuzzing. There is one bug that the fuzzer found in the time crate (via git-date). I haven't got around to minimising it down to something understandable, but when I do I'll open a bug report :)

jhpratt commented 1 year ago

What would this amount to in this repository? Or is it entirely upstream?

silvergasp commented 1 year ago

This would look something like the following;

It's worth noting that this can all be done entirely upstream. Although a configuration like this is prone to bitrot, i.e. when changes happen in this crate, they may need to be reflected in the fuzz tests. If the fuzz tests are in a separate repository they can quickly fall out of sync with the main API.

As I mentioned before I'm happy to put in the effort to get this going. I just want to gauge interest before I commit any serious time to it.

jhpratt commented 1 year ago

That sounds reasonable. Feel free to move ahead with it. The only thing is I would prefer to keep the quickcheck tests. It has caught at least one bug in the past, even if it is limited in breadth.

silvergasp commented 1 year ago

Hey, I've just been doing a bit of digging into this, and one of the first steps I wanted to take was implementing arbitrary for each of the time types. I noticed that there was a PR for this here #502 (albeit with some more manual implementations rather than derived) that was closed with the comment;

After looking into this a little bit, I think what would be ideal is if arbitrary supported interoperability with any type supporting rand. It would (at first glance) take some relatively minor alterations to the internals of arbitrary, as it currently operates on &[u8]. If this were changed to Cow<[u8]>, it could implemented without too much effort.

I'm closing this PR because I think there is a significantly better alternative that would benefit the entire ecosystem, not just users of time.

I don't mind putting in a little more effort to implement a "significantly better alternative that would benefit the entire ecosystem". But I feel like I'm missing some key pieces of information. Do you mind expanding on how you think this would look?

jhpratt commented 1 year ago

I don't recall the exact thing I had in mind, other than a blanket impl for any type that can be used with rand. Looking at the API (quickly), my thought was probably something along the lines of implementing Distribution for Unstructured. This could still be done using a newtype in a third-party crate.

silvergasp commented 1 year ago

I don't recall the exact thing I had in mind, other than a blanket impl for any type that can be used with rand. Looking at the API (quickly), my thought was probably something along the lines of implementing Distribution for Unstructured. This could still be done using a newtype in a third-party crate.

Hmm, I see where you are going with that train of thought. But I'd like to push back on it a little, at least in the context of fuzz testing. Keep in mind this is a somewhat incomplete/loose train of thought, and not really intended as any sort of assertion. Mostly my objection to what you are suggesting is a semantic one. So generally speaking the language around fuzzing is something along the lines of "random == bad fuzzing performance". There is a brief note of this in the documentation for libfuzzer which cargo-fuzz uses under the hood. You can usually improve the performance of a fuzzer by creating a build mode that removes all randomness from the target library, or at the very least uses deterministic randomness. So I guess my objection to making Unstructured implement Distribution, is just to avoid the confusion of somewhat conflicting terminology. I think the naming and interface of arbitrary makes sense in that we want to generate some arbitrary (but not random), variable of a given type.

I could see some value in Unstructured implementing Rng, as a way of creating a deterministic (and maybe not very random) fuzz friendly build mode.

I'll experiment a little with what you've suggested and have a think about how it could be communicated effectively. Then once I have a more collected/consistent set of thoughts on it I'll get back to you on it :)

jhpratt commented 1 year ago

Yeah, a seeded RNG with a static seed would avoid any actual randomness, I believe. I haven't looked too much into this, so I trust your judgement.

silvergasp commented 1 year ago

I'm sorry for the flip-flopping :) So after a little bit more investigation, I'm leaning towards not implementing any of the rand traits for Unstructured. This is because;

  1. Implementing rand::Distribution for Unstructured would in my opinion cause more confusion than it's worth. I think it would be like saying "The Rust std library should implement Display for every type that implements Debug", Display and Debug might have similar interfaces, but they serve different purposes. And likewise, the rand traits and arbitrary while similar serve subtly different purposes.
  2. Unstructured implementing Rng, will likely cause more trouble than it's worth. Many libraries depend on implementors of Rng to produce some level of entropy. Without that entropy, you are likely to get many false positives while fuzz-testing things like hashmaps. Like you are suggesting it would make more sense to seed an Rng with either a static seed or one pulled from Unstructured.

Are you opposed to reconsidering your decision on #502? I would find it quite useful for implementing fuzz tests. It's a shame that your issue here was a non-starter, that would have made things really simple.

jhpratt commented 1 year ago

While I would still like to see the situation improve more generally, it's clear that that won't happen any time soon. I reached out to Andrew a number of months ago to see about possibly co-maintaining quickcheck, but he didn't have the bandwidth on onboard me. Hopefully long-term a better solution arises regarding interoperability, but until then, I'm okay handling it here.

silvergasp commented 1 year ago

Yeah fair, well I'm going to start by rebasing #502 and using that as a starting point for a basic fuzz test. Once I've got a basic proof of concept up we can re-evaluate which way we want to go with quickcheck vs fuzzing.

jhpratt commented 1 year ago

@silvergasp Is this still planned?

jhpratt commented 1 year ago

Closing as inactive.