bokutotu commented 3 years ago

Thank you for providing us with a great crate. I looked here and there to try to save a file in .npy format using ndarray-npy. So I tried to save a file of about 70 GB and got this error.

error

memory allocation of 73603432908 bytes failed
/var/spool/uge/at163/job_scripts/12220153: line 8: 46483 Aborted

code

fn main() {
    let a: Array3<f32> = Array3::zeros((~~~)) // about 70GB 
    // do something
    write_npy(
        &(dir.to_string() + &fname + "_input_C.npy"),
        &features.input_c,
    )
    .unwrap(); // error
}

Based on the line the error occurs, I think there is a possibility that more memory is being used when saving the file. Are there any other memory efficient functions other than the one I used?

jturner314 commented 3 years ago

This is a fun problem. My machine can't even even create an array that big in memory. I'm surprised that you're seeing this, though. write_npy shouldn't be allocating much memory, especially not ~70 GB as indicated by the error message. More information would be helpful to diagnose the issue:

Does the array have standard layout, Fortran layout, or another layout? (In other words, what is the result of features.input_c.is_standard_layout() and features.input_c.view().reversed_axes().is_standard_layout()?)
What specific versions of ndarray, ndarray-npy, and Rust are you using? (You can determine this by searching for name = "ndarray" and name = "ndarray-npy" in your Cargo.lock, and calling rustc --version.)

What happens when you run this program, which just allocates ~70 GB and writes it to a file?

use std::fs::File;
use std::io::Write;

fn main() -> Result<(), Box<dyn std::error::Error>> {
  let mut file = File::create("test")?;

  // Allocate 73.6 GB of data.
  let num_bytes = 73603432908;
  let mut data = vec![0u8; num_bytes];

  // Let's make at least a couple of the elements nonzero.
  data[3] = 42;
  data[num_bytes - 10] = 5;

  file.write_all(&data)?;

  println!("success");
  Ok(())
}

Are you sure that the allocation failure is occurring in the write_npy call, and not somewhere else (e.g. when first allocating the array or when performing an arithmetic operation on it which allocates another array)?

bokutotu commented 3 years ago

I apologize for the delay in replying. The supercomputer I am currently using is undergoing maintenance, so it is difficult to answer the above questions. I will try to answer the first two questions.

features.input_c.is_standard_layout()

-> true

version I use

ndarray-> 0.14, ndarray-npy-> 0.7.1

jturner314 commented 3 years ago

Okay, for Array3<f32> in standard layout, the relevant portions of the code are:

Basically, this consists of checking the layout of the array (which for Array3 should perform no allocations), writing the .npy header (which performs a few small allocations), getting the array data as a &[f32] slice via as_slice_memory_order, and then casting the contiguous slice of data from &[f32] to &[u8] and calling write_all on the writer (i.e. the File in this case). The only place where I could potentially see a 70 GB allocation occurring is if std::fs::File's implementation of write_all makes a copy of the 70 GB slice of data in memory for some reason, but that seems unlikely, and I'd consider it a bug in std::fs::File rather than ndarray-npy.

So, I think it's unlikely that this is a bug in ndarray-npy. When the supercomputer is operational again, I'd suggest trying the sample code I provided in my previous comment to see if just allocating a large amount of data and writing it to a file is problematic. If that succeeds without errors, then I'd suggest trying to narrow down where the allocation is occurring. (Perhaps the simplest approach would be to step through the code in a debugger and see where the program crashes. Alternatively, you could try replacing the global allocator with something that provides more information, or you could add logging messages between each line of code in the area where you think the allocation might be occurring.) In your initial comment, you seemed to be somewhat unsure that the allocation is actually in ndarray-npy. My guess is that it's somewhere else in the code. If you're able to provide the code, I could help look to see where an allocation might be occurring, but otherwise, I'm not sure there's much I can do to help.

jturner314 commented 3 years ago

Out of curiosity, have you been able to diagnose the issue? Another tool which may be useful is Heaptrack.

jturner314 / ndarray-npy

Cannot save large files. #49

error

code

features.input_c.is_standard_layout()

version I use