tokahuke / yaque

Yaque is yet another disk-backed persistent queue for Rust.
Other
79 stars 11 forks source link

Fuzz testing: RecvBatch strangeness, file handle leaks #20

Open mullr opened 1 year ago

mullr commented 1 year ago

I wrote some simple fuzz tests for this library (https://github.com/mullr/yaque/tree/fuzz-tests). The good news is that it works very well for something that's never been fuzzed! I did identify a few issues:

mullr commented 1 year ago

After running the fuzzer for a little awhile, we run out of file handles. It seems like something in this library is leaking, but it's unclear where or why.

I've investigated this a little bit, and I'm guessing that this is likely to be caused by the test harness running so fast, thrashing inotify subscriptions very fast. If that's the case, then there may not be a problem in the library.

tokahuke commented 1 year ago

I've investigated this a little bit, and I'm guessing that this is likely to be caused by the test harness running so fast, thrashing inotify subscriptions very fast. If that's the case, then there may not be a problem in the library.

That is rather nice news! Anyway, I will need to have a closer look at the first point. Could you also make a PR of your tests to this repo?

mullr commented 1 year ago

Sorry for the delay, it's over at https://github.com/tokahuke/yaque/pull/22

tokahuke commented 1 year ago

Sorry for the delay, it's over at #22

No problem. I myself am having little time during these weeks to code at all.

mullr commented 8 months ago

I was able to investigate the recv_batch situation a bit, and developed a repro:

    // test simple try_recv_batch uses.
    #[test]
    fn test_try_recv_batch_too_much_then_less_than_everything() {
        // Populate a queue:
        let mut sender = SenderBuilder::new()
            .segment_size(512)
            .open("data/try-recv-batch-zero-after-read-too-much")
            .unwrap();

        let mut receiver = Receiver::open("data/try-recv-batch-zero-after-read-too-much").unwrap();

        sender.try_send(&[1]).unwrap();
        sender.try_send(&[2]).unwrap();

        assert!(matches!(
            receiver.try_recv_batch(3),
            Err(TryRecvError::QueueEmpty)
        ));

        let guard = receiver.try_recv_batch(1).unwrap();
        let items = guard.try_into_inner().unwrap();
        assert!(items.len() == 1);
    }

I don't know the whole story, but I think this kind of behavior is triggered by canceling the try_recv async block before it's finished. .now_or_never is one way to do it, as in this case. I've seen that behavior when using recv_batch_timeout in a select as well, when the recv batch has started processing but not yet hit the timeout, when a different branch of the select is taken, dropping the future. The queue is left in an inconsistent state such commiting future reads will fail if they don't consume all the data in read_and_unused.

Hopefully that's not _in_accurate. Regardless, the above test definitely triggers the behavior.