Closed bluss closed 3 years ago
Thanks for looking over the crate and mentioning this issue. I've known about this, but didn't have a chance to try it to see what impact it would have (to know under what circumstances I should use buffering). I just tried a few cases in release mode based on your example code (plush a .flush()
after the array.write_npy
call):
BufWriter::new(BufWriter::new(...))
): 4.1GB in 11.0sBufWriter::new(BufWriter::new(...))
): 4.1GB in 1.2sThe difference between the non-buffering and buffering cases with non-standard layout is huge! I had no idea it was that significant. The overhead of nested buffering is noticeable, but not huge, in the non-standard layout case. In the standard layout case, there is effectively no difference regardless of buffering.
I think I can recommend just using BufWriter. When BufWriter is passed data longer than its internal buffer; in that case it just bypasses the internal buffer and writes it all to the underlying file.
This is good to know, because it means that the overhead of the BufWriter
should be small even in the cases where the array data is written in a single large write (when the array is in standard or Fortran layout). This conclusion matches the results I obtained above.
These are a couple of other things that I'd like to test:
I wonder whether buffering would help when reading too, since reading the header performs quite a few small reads, even though reading the array data is a single large read. Edit: The struck-through text is incorrect: the header is read in three reads, and the array data is read in a single read. I would think this would be few enough reads that buffering would be negligible.
I wonder about the best way to handle .npz
archives (which are ZIP archives containing .npy
files) with NpzWriter
, since the current implementation uses WriteNpyExt::write_npy
to write the .npy
files into the .npz
archive. I would guess that it would still be fine for WriteNpyExt::write_npy
to wrap the writer in a BufWriter
, but it may be a better to wrap the underlying writer in a BufWriter
before wrapping it in the ZipWriter
and NpzWriter
.
I see a couple of options:
Change the write_npy
function to always use buffering (since it's always working with a File
), add a .flush()
to WriteNpyExt
and NpzWriter
, and use BufWriter
in the examples for WriteNpyExt
and NpzWriter
.
Change WriteNpyExt
to always use buffering.
When I get a chance, I'll do some testing of the buffering overhead when writing to in-memory writers (to know whether it's potentially useful for the user to be able to write without buffering) and testing the approaches for handling .npz
files.
I think option (1) would follow the Rust language conventions the closest, no wrapping that the user doesn't want to pay for, but I would understand if you think it leaves the user high and dry without buffering. If I understand correctly, the ZipWriter doesn't buffer by itself? NpzWriter(ZipWriter(BufWriter(File))) is the best layering - is my guess.
Maybe the best reason to avoid (2) is to not allocate a bufwriter unnecessarily? But it's hard to be perfect about that anyway.
NpzWriter(ZipWriter(BufWriter(File))) is the best layering - is my guess.
That was my intuition, too, but I just did some testing, and it turns out that's not the case. The results are actually pretty interesting.
I modified write_example
in examples/simple.npz
to write a third array of shape 256×1024×1024 with f32
element type and all elements zero, and I removed the call to read_example
. The results are as follows:
.npz
file):
NpzWriter<BufWriter<File>>
):ZipWriter
in BufWriter
in add_array
):NpzWriter<BufWriter<File>>
) and when writing individual arrays (wrap the ZipWriter
in BufWriter
in add_array
), i.e. nested buffering:.npz
file; the all-zero data is highly compressible):
NpzWriter<BufWriter<File>>
):ZipWriter
in BufWriter
in add_array
):NpzWriter<BufWriter<File>>
) and when writing individual arrays (wrap the ZipWriter
in BufWriter
in add_array
), i.e. nested buffering:Cursor::new(Vec::new())
) instead of File
:
NpzWriter<BufWriter<Cursor<Vec<u8>>>>
):ZipWriter
in BufWriter
in add_array
):NpzWriter<BufWriter<Cursor<Vec<u8>>>>
) and when writing individual arrays (wrap the ZipWriter
in BufWriter
in add_array
), i.e. nested buffering:Cursor::new(Vec::new())
) instead of File
:
NpzWriter<BufWriter<Cursor<Vec<u8>>>>
):ZipWriter
in BufWriter
in add_array
):NpzWriter<BufWriter<Cursor<Vec<u8>>>>
) and when writing individual arrays (wrap the ZipWriter
in BufWriter
in add_array
), i.e. nested buffering:Observations:
In conclusion, it's reasonable for NpzWriter
to always buffer around individual arrays, because it helps substantially in the non-standard layout cases, and the "compressed, file, standard layout" case is more important than the "compressed, in-memory writer, standard layout" case. The allocations for the BufWriter
s could possibly hurt performance for small arrays, but writing small arrays doesn't take much time anyway. The user may want to also wrap the underlying writer in a BufWriter
if they know that their arrays have non-standard layouts, but I wouldn't recommend it in general.
I think option (1) would follow the Rust language conventions the closest, no wrapping that the user doesn't want to pay for, but I would understand if you think it leaves the user high and dry without buffering.
Yeah, I agree that option (1) most closely follows existing conventions. The only reason to prefer (2) would be to make things a little easier for the user in the common case (so that they wouldn't have to wrap the File
in a BufWriter
themselves). Option (1) is fine, though, IMO, because once I add a .flush()
to WriteNpyExt::write_npy
, all the user has to do is add a call to BufWriter::new
, and flushing will be handled automatically.
Thanks for talking it through with me.
I've created #51 to resolve this issue.
I just noticed when reading through, that
write_npy
uses unbuffered File I/O. For contig arrays this is ideal but for other cases it might be slower than it should be. Tested with the latest version from crates.io and cargo run --release.This test program writes a 4G file - one may of course scale that down if wanted, should be simple to reproduce even with 10MB.
I think I can recommend just using BufWriter. When BufWriter is passed data longer than its internal buffer; in that case it just bypasses the internal buffer and writes it all to the underlying file.