Open jonathanstrong opened 3 years ago
using
np.load
, you will get a dict-like object that allows you to access the arrays without the.npy
extension (i.e. at keyarr_0
). however, usingNpzReader
, you need to use the fullarr_0.npy
name to retrieve the same array. just wanted to flag as this tripped me up a bit.
Thanks for pointing this out. I've created #48 to track this issue.
Thanks also for the PR. There are a few things about the proposed API which are unsatisfying to me:
read_npz
/write_npz
. I'd prefer for either both to accept a name parameter or neither to accept a name. I think it would be better for both to accept a name.read_npz
/write_npz
would IMO be more appropriate for functions which read/write general .npz
files, rather than functions which read/write .npz
files containing only a single array. For these functions, I'd prefer names more like read_npz_array
/write_npz_array_compressed
.&str
rather than N: Into<String>
.Creating a .npz
archive for a single array seems somewhat awkward. I wonder if you'd be happier using a single-file compression format (such as .gz
, .xz
, .bz2
, or .zst
) applied to a .npy
file instead of using a .zip
/.npz
archive. This would avoid the problem of choosing a name for the array in the archive and would avoid the complexity of the .zip
format. For example, to write/read a .npy.gz
file using ndarray-npy
, you could do this:
use flate2::{bufread::GzDecoder, write::GzEncoder, Compression};
use ndarray::{array, Array2};
use ndarray_npy::{ReadNpyError, ReadNpyExt, WriteNpyError, WriteNpyExt};
use std::fs::File;
use std::io::{BufReader, BufWriter, Write};
use std::path::Path;
fn write_npy_gz<P, T>(path: P, array: &T) -> Result<(), WriteNpyError>
where
P: AsRef<Path>,
T: WriteNpyExt,
{
// Note: I'm not sure if the `BufWriter` actually helps or not.
let mut writer = GzEncoder::new(BufWriter::new(File::create(path)?), Compression::default());
array.write_npy(&mut writer)?;
writer.finish()?.flush()?;
Ok(())
}
fn read_npy_gz<P, T>(path: P) -> Result<T, ReadNpyError>
where
P: AsRef<Path>,
T: ReadNpyExt,
{
// Note: I'm not sure if the `BufReader` actually helps or not.
T::read_npy(GzDecoder::new(BufReader::new(File::open(path)?)))
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
let arr1 = array![[1, 2, 3], [4, 5, 6]];
// Write the array.
write_npy_gz("foo.npy.gz", &arr1)?;
// Read it back.
let arr2: Array2<i32> = read_npy_gz("foo.npy.gz")?;
println!("arr1:\n{}", arr1);
println!("arr2:\n{}", arr2);
assert_eq!(arr1, arr2);
Ok(())
}
To read it with NumPy, you could do this:
import numpy as np
import gzip
def load_npy_gz(path):
with gzip.open(path) as f:
return np.load(f)
arr = load_npy_gz('foo.npy.gz')
print(arr)
(You could also decompress .npy.gz
files at the command line using gunzip
.)
these work like
read_npy
andwrite_npy
but write compressed.npz
files instead. I wanted this functionality for writing ephemeral array files to be able to check something later if needed without taking up too much disk space.in comparing to
read_npy
/write_npy
, there is one major difference: since anpz
file can contain multiple named arrays/files, this picks a default name for the single array it writes withwrite_npz
, while allowing the user to specify the name to extract withread_npz
. this may not be the best choice, but it seemed less than ideal to not permit specifying the name inread_npz
, and I wantedwrite_npz
to remain as simple as possible.I picked the default name for
write_npz
based on what numpy does insavez_compressed
("arr_0.npy"). however, I think there is a divergence there. usingnp.load
, you will get a dict-like object that allows you to access the arrays without the.npy
extension (i.e. at keyarr_0
). however, usingNpzReader
, you need to use the fullarr_0.npy
name to retrieve the same array. just wanted to flag as this tripped me up a bit.thanks for your consideration of this pull request.