rogersce / cnpy

library to read/write .npy and .npz files in C/C++
MIT License
1.34k stars 301 forks source link

cnpy writes corrupt npz files for too large arrays #39

Open leezu opened 6 years ago

leezu commented 6 years ago

Consider the following simple program:

#include <vector>

#include "cnpy.h"

int main() {
  std::vector<uint32_t> a(778126008);
  std::vector<uint32_t> b(389063004);

  std::string output("/tmp/tmparray");
  cnpy::npz_save(output, "a", &a[0], {a.size()}, "w");
  cnpy::npz_save(output, "b", &b[0], {b.size()}, "a");
}

/tmp/tmparray will have a corrupt zipfile header and numpy will not load the array. unzip outputs the following:

% unzip -v tmparray                                                                                                                                         /tmp ip-172-31-92-71
Archive:  tmparray
warning [tmparray]:  4294967296 extra bytes at beginning or within zipfile
  (attempting to process anyway)
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
3112504112  Stored 3112504112   0% 1980-00-00 00:00 f6b5ba4e  a.npy
1556252096  Stored 1556252096   0% 1980-00-00 00:00 544711f7  b.npy
--------          -------  ---                            -------
4668756208         4668756208   0%                            2 files

% unzip tmparray                                                                                                                                            /tmp ip-172-31-92-71
Archive:  tmparray
warning [tmparray]:  4294967296 extra bytes at beginning or within zipfile
  (attempting to process anyway)
file #1:  bad zipfile offset (local header sig):  4294967296
  (attempting to re-compensate)
 extracting: a.npy
 extracting: b.npy

I guess this is due to missing Zip64 format support. I see that you currently implement Zip file support yourself. Moving to https://libzip.org/ should fix this issue. Is there any reason why you chose zlib over the libzip (which is a higher-level interface to zlib)?

shahabfatemi commented 5 years ago

I have the same problem. Cannot write large npz files!

leezu commented 5 years ago

@shahabfatemi you can consider using the fixed code here https://github.com/leezu/cnpy/tree/libzip

shahabfatemi commented 5 years ago

@leezu Thanks for the quick reply! Your branch has dependencies (e.g., "./zip.hpp" in cnpy.h) which are not included in the repository!

leezu commented 5 years ago

@shahabfatemi the dependency on zip.hpp is automatically fetched by cmake before build. If it doesn't work for you, you can use git submodule update --init to initialize the submodule containing the dependency