Loading 2D npz double arrays

gospodnetic commented 7 years ago

Hello, I am unclear, if I have a 2D array stored in a npz file, how can I access the data and store it in a vector<vector>?

Also, if a scalar is saved as a npz from a simple python code (below), it's shape[0] is not 1 (as shown in the example), but 0.

x = 123
np.savez("scalar.npz", x)

Is there a way that I can exactly determine the underlying data type within the npz file if all I know is that it contains floating point values ranging 0 - 255?

Thank you!

rogersce commented 7 years ago

It wouldn't be as efficient as indexing a single vector yourself, but if you truly needed the data in a vector of vectors, you could do the following: ` cnpy::NpyArray arr = cnpy::npy_load("arr1.npy"); double* loaded_data = arr.data();

size_t nrows = arr.shape[0]; size_t ncols = arr.shape[1]; std::vector<std::vector> vec2d;

vec2d.reserve(nrows); for(size_t row = 0; row < nrows;row++) { vec2d.emplace_back(ncols); for(size_t col = 0;col < ncols;col++) { vec2d[row][col] = loaded_data[row*nrows+col]; } } `

chraibi commented 6 years ago

Just for the sake of visibility..

cnpy::NpyArray arr = cnpy::npy_load("arr1.npy");
double* loaded_data = arr.data();

size_t nrows = arr.shape[0];
size_t ncols = arr.shape[1];
std::vector<std::vector> vec2d;

vec2d.reserve(nrows);
for(size_t row = 0; row < nrows;row++) {
   vec2d.emplace_back(ncols);
   for(size_t col = 0;col < ncols;col++) {
        vec2d[row][col] = loaded_data[row*nrows+col];
   }
}

ranka47 commented 6 years ago

Some code edits that I felt were necessary while compiling the code...

cnpy::NpyArray arr = cnpy::npy_load("arr1.npy");
double* loaded_data = arr.data<double>();

size_t nrows = arr.shape[0];
size_t ncols = arr.shape[1];
std::vector<std::vector<double> > vec2d;

vec2d.reserve(nrows);
for(size_t row = 0; row < nrows;row++) {
   vec2d.emplace_back(ncols);
   for(size_t col = 0;col < ncols;col++) {
        vec2d[row][col] = loaded_data[row*nrows+col];
   }
}

clemense commented 5 years ago

It compiles but the indexing is wrong. Should be: vec2d[row][col] = loaded_data[row*ncols+col];.

ranka47 commented 5 years ago

It compiles but the indexing is wrong. Should be: vec2d[row][col] = loaded_data[row*ncols+col];.

It depends on how you are counting. If the matrix is row-major then what you have specified is correct. However, the code I wrote is for the matrix that has been stored in column-major format.

clemense commented 5 years ago

Sorry, I need to disagree. Two things:

The matrices that are stored with cnpy are always row-major. If you want to store matrices in column-major format, this should be reflected in the header information of the npy file (currently, cnpy writes a constant fortran_order: False and during npy_load it checks assert(!fortran_order), i.e. all matrices are row-major). If you ignore this, you will get different results when loading the same matrix with numpy and cnpy.
Even if you want to read the matrix in column-major format your indexing is wrong. It should be vec2d[row][col] = loaded_data[col*nrows+row]. Your proposed indexing coincidentally works for square matrices but not for arbitrary ones.

Please correct me if I'm missing something.

wolfv commented 5 years ago

We've integrated cnpy into xtensor & xtensor-io by the way, if you want to use a "NumPy-like" container directly in C++ without needing to resort to inefficient vector-of-vector constructs.

https://github.com/QuantStack/xtensor (NPY loading)
https://github.com/QuantStack/xtensor-io (NPZ loading)

vmiheer commented 4 years ago

I was going to ask if cnpy could automatically load data type/shapes. But seems like one should use xtensor instead? @wolfv, Should this be in readme file?

rogersce / cnpy

Loading 2D npz double arrays #20