mdsteele / rust-cfb

Rust library for reading/writing Compound File Binary (structured storage) files
MIT License
46 stars 20 forks source link

Writing streams slow #42

Open francisdb opened 1 year ago

francisdb commented 1 year ago

I'm working on a lib that reads vpx files (Visual Pinball). See https://github.com/francisdb/vpxtool

These files are easily 200MB and contain hundreds of streams of up to 10MB

Reading the whole file goes reasonably fast but once you start writing them things slow down considerably. Even in memory. I wonder if I am doing something wrong.

Reading a typical file takes about 1 second. Writing it out again takes 20 seconds.

Test code and output below. You can play with the range and the data size.

    #[test]
    fn test_pref() -> io::Result<()> {
        init();
        let mut buff = Vec::new();
        let mut test_comp = CompoundFile::create(Cursor::new(&mut buff))?;
        // create a buffer of 10mb
        let data = vec![0; 1 * 1024 * 1024];
        let range = 0..100;
        range.map(|i| {
            write_to_stream(&mut test_comp, &data, i)
        }).collect()
    }

    #[time]
    fn write_to_stream<F: Read + Write + Seek>(comp: &mut CompoundFile<F>, mut data: &[u8], i: u32) -> io::Result<()> {
        let mut stream = comp.create_stream(format!("test{i}"))?;
        stream.write_all(&mut data)
    }

logs

[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=5.743667ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=5.538ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=6.094625ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=6.425709ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=6.361792ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=7.098208ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=7.519709ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=7.89525ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=8.024583ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=8.373ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=8.485375ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=8.919208ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=9.602417ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=10.115292ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=10.607084ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=11.395ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=11.895166ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=12.490291ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=13.017541ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=13.657875ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=14.15625ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=14.760792ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=15.555042ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=16.634042ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=16.832542ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=17.376208ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=17.893ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=18.489167ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=18.946292ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=19.863459ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=20.193375ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=21.42675ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=21.432ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=21.991041ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=22.558125ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=23.253167ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=23.777917ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=24.807875ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=25.160708ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=25.826791ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=26.252583ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=27.15825ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=27.645959ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=28.065875ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=28.566042ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=29.33825ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=29.694375ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=30.438125ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=30.827875ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=31.443625ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=32.134625ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=33.125708ms
[2023-08-16T14:30:38Z DEBUG TimerFinished] write_to_stream(), Elapsed=33.42475ms
[2023-08-16T14:30:39Z DEBUG TimerFinished] write_to_stream(), Elapsed=33.939125ms
[2023-08-16T14:30:39Z DEBUG TimerFinished] write_to_stream(), Elapsed=35.251083ms
[2023-08-16T14:30:39Z DEBUG TimerFinished] write_to_stream(), Elapsed=35.594666ms
[2023-08-16T14:30:39Z DEBUG TimerFinished] write_to_stream(), Elapsed=35.682291ms
[2023-08-16T14:30:39Z DEBUG TimerFinished] write_to_stream(), Elapsed=36.4895ms
[2023-08-16T14:30:39Z DEBUG TimerFinished] write_to_stream(), Elapsed=37.234542ms
[2023-08-16T14:30:39Z DEBUG TimerFinished] write_to_stream(), Elapsed=37.837166ms
[2023-08-16T14:30:39Z DEBUG TimerFinished] write_to_stream(), Elapsed=38.385583ms
[2023-08-16T14:30:39Z DEBUG TimerFinished] write_to_stream(), Elapsed=39.532167ms
[2023-08-16T14:30:39Z DEBUG TimerFinished] write_to_stream(), Elapsed=39.585083ms
[2023-08-16T14:30:39Z DEBUG TimerFinished] write_to_stream(), Elapsed=40.647917ms
[2023-08-16T14:30:39Z DEBUG TimerFinished] write_to_stream(), Elapsed=40.649208ms
[2023-08-16T14:30:39Z DEBUG TimerFinished] write_to_stream(), Elapsed=41.429334ms
[2023-08-16T14:30:39Z DEBUG TimerFinished] write_to_stream(), Elapsed=41.6905ms
[2023-08-16T14:30:39Z DEBUG TimerFinished] write_to_stream(), Elapsed=42.948ms
[2023-08-16T14:30:39Z DEBUG TimerFinished] write_to_stream(), Elapsed=43.590167ms
[2023-08-16T14:30:39Z DEBUG TimerFinished] write_to_stream(), Elapsed=43.527167ms
[2023-08-16T14:30:39Z DEBUG TimerFinished] write_to_stream(), Elapsed=44.451833ms
[2023-08-16T14:30:39Z DEBUG TimerFinished] write_to_stream(), Elapsed=45.197708ms
[2023-08-16T14:30:39Z DEBUG TimerFinished] write_to_stream(), Elapsed=45.621583ms
[2023-08-16T14:30:39Z DEBUG TimerFinished] write_to_stream(), Elapsed=46.401084ms
[2023-08-16T14:30:39Z DEBUG TimerFinished] write_to_stream(), Elapsed=46.834417ms
[2023-08-16T14:30:39Z DEBUG TimerFinished] write_to_stream(), Elapsed=47.352875ms
[2023-08-16T14:30:39Z DEBUG TimerFinished] write_to_stream(), Elapsed=48.269542ms
[2023-08-16T14:30:40Z DEBUG TimerFinished] write_to_stream(), Elapsed=48.578709ms
[2023-08-16T14:30:40Z DEBUG TimerFinished] write_to_stream(), Elapsed=49.726541ms
[2023-08-16T14:30:40Z DEBUG TimerFinished] write_to_stream(), Elapsed=49.953417ms
[2023-08-16T14:30:40Z DEBUG TimerFinished] write_to_stream(), Elapsed=50.253917ms
[2023-08-16T14:30:40Z DEBUG TimerFinished] write_to_stream(), Elapsed=51.099792ms
[2023-08-16T14:30:40Z DEBUG TimerFinished] write_to_stream(), Elapsed=51.679792ms
[2023-08-16T14:30:40Z DEBUG TimerFinished] write_to_stream(), Elapsed=52.570666ms
[2023-08-16T14:30:40Z DEBUG TimerFinished] write_to_stream(), Elapsed=53.306375ms
[2023-08-16T14:30:40Z DEBUG TimerFinished] write_to_stream(), Elapsed=53.425708ms
[2023-08-16T14:30:40Z DEBUG TimerFinished] write_to_stream(), Elapsed=54.233208ms
[2023-08-16T14:30:40Z DEBUG TimerFinished] write_to_stream(), Elapsed=54.57725ms
[2023-08-16T14:30:40Z DEBUG TimerFinished] write_to_stream(), Elapsed=55.703334ms
[2023-08-16T14:30:40Z DEBUG TimerFinished] write_to_stream(), Elapsed=56.719792ms
[2023-08-16T14:30:40Z DEBUG TimerFinished] write_to_stream(), Elapsed=58.724125ms
[2023-08-16T14:30:40Z DEBUG TimerFinished] write_to_stream(), Elapsed=57.548625ms
[2023-08-16T14:30:40Z DEBUG TimerFinished] write_to_stream(), Elapsed=57.794ms
[2023-08-16T14:30:40Z DEBUG TimerFinished] write_to_stream(), Elapsed=58.499625ms
[2023-08-16T14:30:40Z DEBUG TimerFinished] write_to_stream(), Elapsed=58.816417ms
[2023-08-16T14:30:41Z DEBUG TimerFinished] write_to_stream(), Elapsed=60.089708ms
[2023-08-16T14:30:41Z DEBUG TimerFinished] write_to_stream(), Elapsed=60.245208ms
[2023-08-16T14:30:41Z DEBUG TimerFinished] write_to_stream(), Elapsed=60.951875ms
[2023-08-16T14:30:41Z DEBUG TimerFinished] write_to_stream(), Elapsed=61.082792ms
[2023-08-16T14:30:41Z DEBUG TimerFinished] write_to_stream(), Elapsed=62.138583ms
francisdb commented 1 year ago

Looks like the write path is quite affected by opt-level for tests. opt-level = 1 already yields much better results.

But I wonder if some things could still be improved, eg constant time for writing new streams.

mdsteele commented 1 year ago

Thanks for the report. There's probably a lot of low-hanging fruit for improving performance; most of the work on this crate so far has been focused on trying to achieve correctness. PRs for improvements would be welcome, especially if they come with benchmark tests. (That said, my inclination is not to worry too much about performance in debug builds. If performance is bad even for release builds, then that's more worth addressing.)

I haven't had a chance to dig into this specific issue, but one possible cause of this is that this implementation doesn't currently keep the directory entry binary search tree balanced (fixing that has been on my to-do list for a long time). So adding a lot of sibling nodes will eventually create a long search chain. I wouldn't expect this would make all that much difference for only 100 entries, but maybe in debug builds it does. Or maybe there's another performance issue I'm not thinking of.