jturner314 / ndarray-npy

.npy and .npz file format support for ndarray
https://docs.rs/ndarray-npy
Apache License 2.0
56 stars 18 forks source link

Buffering File I/O #50

Closed bluss closed 3 years ago

bluss commented 3 years ago

I just noticed when reading through, that write_npy uses unbuffered File I/O. For contig arrays this is ideal but for other cases it might be slower than it should be. Tested with the latest version from crates.io and cargo run --release.

This test program writes a 4G file - one may of course scale that down if wanted, should be simple to reproduce even with 10MB.

use ndarray::prelude::*;
use ndarray_npy::{self, WriteNpyExt};
use std::io::BufWriter;
use std::fs::File;
use std::error::Error;

fn main() -> Result<(), Box<dyn Error>> {
    let mut array: Array<f32, _> = Array::zeros((1024, 1024, 1024));
    array.swap_axes(0, 1);
    // slow
    //ndarray_npy::write_npy("zeros.npy", &array)?;

    // "fast"
    let file = BufWriter::new(File::create("zeros.npy")?);
    array.write_npy(file)?;  // In real code maybe flush the BufWriter like it docs recommend

    Ok(())

}

I think I can recommend just using BufWriter. When BufWriter is passed data longer than its internal buffer; in that case it just bypasses the internal buffer and writes it all to the underlying file.

jturner314 commented 3 years ago

Thanks for looking over the crate and mentioning this issue. I've known about this, but didn't have a chance to try it to see what impact it would have (to know under what circumstances I should use buffering). I just tried a few cases in release mode based on your example code (plush a .flush() after the array.write_npy call):

The difference between the non-buffering and buffering cases with non-standard layout is huge! I had no idea it was that significant. The overhead of nested buffering is noticeable, but not huge, in the non-standard layout case. In the standard layout case, there is effectively no difference regardless of buffering.

I think I can recommend just using BufWriter. When BufWriter is passed data longer than its internal buffer; in that case it just bypasses the internal buffer and writes it all to the underlying file.

This is good to know, because it means that the overhead of the BufWriter should be small even in the cases where the array data is written in a single large write (when the array is in standard or Fortran layout). This conclusion matches the results I obtained above.

These are a couple of other things that I'd like to test:

I see a couple of options:

  1. Change the write_npy function to always use buffering (since it's always working with a File), add a .flush() to WriteNpyExt and NpzWriter, and use BufWriter in the examples for WriteNpyExt and NpzWriter.

  2. Change WriteNpyExt to always use buffering.

When I get a chance, I'll do some testing of the buffering overhead when writing to in-memory writers (to know whether it's potentially useful for the user to be able to write without buffering) and testing the approaches for handling .npz files.

bluss commented 3 years ago

I think option (1) would follow the Rust language conventions the closest, no wrapping that the user doesn't want to pay for, but I would understand if you think it leaves the user high and dry without buffering. If I understand correctly, the ZipWriter doesn't buffer by itself? NpzWriter(ZipWriter(BufWriter(File))) is the best layering - is my guess.

Maybe the best reason to avoid (2) is to not allocate a bufwriter unnecessarily? But it's hard to be perfect about that anyway.

jturner314 commented 3 years ago

NpzWriter(ZipWriter(BufWriter(File))) is the best layering - is my guess.

That was my intuition, too, but I just did some testing, and it turns out that's not the case. The results are actually pretty interesting.

I modified write_example in examples/simple.npz to write a third array of shape 256×1024×1024 with f32 element type and all elements zero, and I removed the call to read_example. The results are as follows:

Observations:

In conclusion, it's reasonable for NpzWriter to always buffer around individual arrays, because it helps substantially in the non-standard layout cases, and the "compressed, file, standard layout" case is more important than the "compressed, in-memory writer, standard layout" case. The allocations for the BufWriters could possibly hurt performance for small arrays, but writing small arrays doesn't take much time anyway. The user may want to also wrap the underlying writer in a BufWriter if they know that their arrays have non-standard layouts, but I wouldn't recommend it in general.

I think option (1) would follow the Rust language conventions the closest, no wrapping that the user doesn't want to pay for, but I would understand if you think it leaves the user high and dry without buffering.

Yeah, I agree that option (1) most closely follows existing conventions. The only reason to prefer (2) would be to make things a little easier for the user in the common case (so that they wouldn't have to wrap the File in a BufWriter themselves). Option (1) is fine, though, IMO, because once I add a .flush() to WriteNpyExt::write_npy, all the user has to do is add a call to BufWriter::new, and flushing will be handled automatically.

Thanks for talking it through with me.

jturner314 commented 3 years ago

I've created #51 to resolve this issue.