vigna / webgraph-rs

A Rust port of the WebGraph framework
Apache License 2.0
32 stars 6 forks source link

Infinite loop in `BatchIterator::new_from_vec_sorted` #71

Closed progval closed 9 months ago

progval commented 9 months ago

repro:

#[test]
fn test_batchiterator() -> Result<()> {
    let mut tempfile = temp_dir(std::env::temp_dir());
    tempfile.push_str("BatchIterator_buffer");
    BatchIterator::new_from_vec_sorted(&tempfile, &[(2, 16), (3, 19), (8, 1), (8, 6), (11, 15), (11, 20), (15, 16)])?.collect::<Vec<_>>();
    Ok(())
}

This is to be sensitive to the data; similar inputs with fewer or different tuples don't all trigger the bug.

I didn't have this issue with https://github.com/vigna/webgraph-rs/commit/d3fec35ad69aca12b9041b779ada641e5c87faf1

vigna commented 9 months ago

This is most likely due to the switch to the zero-extended semantics. There must be something wrong that was silently fixed by the EOF semantics. That should be easy to test. Gimme a minute :).

vigna commented 9 months ago

OK, both problems were caused by the same change—the "small" change of signature of flush from self to &mut self. it turned out we were flushing the streams in the drop method, not in the flush method. So not passing self was causing drop not to happen, and flush was not called.

But the problem with mmap is real—very unlikely to happen, but it might happen, so I'm happy we nailed it now.