Open bokutotu opened 3 years ago
This is a fun problem. My machine can't even even create an array that big in memory. I'm surprised that you're seeing this, though. write_npy
shouldn't be allocating much memory, especially not ~70 GB as indicated by the error message. More information would be helpful to diagnose the issue:
Does the array have standard layout, Fortran layout, or another layout? (In other words, what is the result of features.input_c.is_standard_layout()
and features.input_c.view().reversed_axes().is_standard_layout()
?)
What specific versions of ndarray
, ndarray-npy
, and Rust are you using? (You can determine this by searching for name = "ndarray"
and name = "ndarray-npy"
in your Cargo.lock
, and calling rustc --version
.)
What happens when you run this program, which just allocates ~70 GB and writes it to a file?
use std::fs::File;
use std::io::Write;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut file = File::create("test")?;
// Allocate 73.6 GB of data.
let num_bytes = 73603432908;
let mut data = vec![0u8; num_bytes];
// Let's make at least a couple of the elements nonzero.
data[3] = 42;
data[num_bytes - 10] = 5;
file.write_all(&data)?;
println!("success");
Ok(())
}
Are you sure that the allocation failure is occurring in the write_npy
call, and not somewhere else (e.g. when first allocating the array or when performing an arithmetic operation on it which allocates another array)?
I apologize for the delay in replying. The supercomputer I am currently using is undergoing maintenance, so it is difficult to answer the above questions. I will try to answer the first two questions.
-> true
ndarray
-> 0.14, ndarray-npy
-> 0.7.1
Okay, for Array3<f32>
in standard layout, the relevant portions of the code are:
write_npy
functionis_standard_layout
portion of write_npy
method implementation for ArrayBase
write_slice
method implementation for f32
Basically, this consists of checking the layout of the array (which for Array3
should perform no allocations), writing the .npy
header (which performs a few small allocations), getting the array data as a &[f32]
slice via as_slice_memory_order
, and then casting the contiguous slice of data from &[f32]
to &[u8]
and calling write_all
on the writer (i.e. the File
in this case). The only place where I could potentially see a 70 GB allocation occurring is if std::fs::File
's implementation of write_all
makes a copy of the 70 GB slice of data in memory for some reason, but that seems unlikely, and I'd consider it a bug in std::fs::File
rather than ndarray-npy
.
So, I think it's unlikely that this is a bug in ndarray-npy
. When the supercomputer is operational again, I'd suggest trying the sample code I provided in my previous comment to see if just allocating a large amount of data and writing it to a file is problematic. If that succeeds without errors, then I'd suggest trying to narrow down where the allocation is occurring. (Perhaps the simplest approach would be to step through the code in a debugger and see where the program crashes. Alternatively, you could try replacing the global allocator with something that provides more information, or you could add logging messages between each line of code in the area where you think the allocation might be occurring.) In your initial comment, you seemed to be somewhat unsure that the allocation is actually in ndarray-npy
. My guess is that it's somewhere else in the code. If you're able to provide the code, I could help look to see where an allocation might be occurring, but otherwise, I'm not sure there's much I can do to help.
Out of curiosity, have you been able to diagnose the issue? Another tool which may be useful is Heaptrack.
Thank you for providing us with a great crate. I looked here and there to try to save a file in .npy format using ndarray-npy. So I tried to save a file of about 70 GB and got this error.
error
code
Based on the line the error occurs, I think there is a possibility that more memory is being used when saving the file. Are there any other memory efficient functions other than the one I used?