kuba-- / zip

A portable, simple zip library written in C
MIT License
1.42k stars 276 forks source link

Problem with big zip files #258

Closed vivkvv closed 2 years ago

vivkvv commented 2 years ago

I have big_zip.raw file. It has size ~ 6Gb. Then I zip its and big_zip.zip has size ~ 20 Mb.

The following code does not work:

    struct zip_t *zip = zip_open("big_zip.zip", 0, 'r');

    auto cnt = zip_entries_total(zip);

    auto err = zip_entry_openbyindex(zip, 0);

    const char *name = zip_entry_name(zip);
    int isdir = zip_entry_isdir(zip);
    size_t bufsize = zip_entry_size(zip); // right size = ~6Gb
    unsigned int crc32 = zip_entry_crc32(zip);

    void *buf = (void*) new unsigned int[bufsize] {1}; // in 64bit version works ok

    bufsize = zip_entry_read(zip, (void **)&buf, &bufsize);

zip_entry_read here returns 0

kuba-- commented 2 years ago

@vivkvv - I assume you have just one 6GB entry. So, my first question is - can you allocate 12 GB? Because, actually this is what you're doing. zip_entry_read allocates memory for you and returns new allocated output buffer and buffer size (look at readme or docs: https://github.com/kuba--/zip/blob/203ef139b0df861fcfd70eca20f8f92925a79846/src/zip.h#L307).

If you want to pre-allocate memory, that's fine, but I'd suggest to use zip_entry_noallocread (https://github.com/kuba--/zip/blob/203ef139b0df861fcfd70eca20f8f92925a79846/src/zip.h#L327).

Apart from memory leak (what maybe is not a problem in your case), I'm guessing, it was a problem to allocate another 6GB.

vivkvv commented 2 years ago

@kuba-- Thanks. Of course, a memory leak is my mistake. But real problem is that function mz_zip_reader_extract_to_heap has the lines

  comp_size = MZ_READ_LE32(p + MZ_ZIP_CDH_COMPRESSED_SIZE_OFS);
  uncomp_size = MZ_READ_LE32(p + MZ_ZIP_CDH_DECOMPRESSED_SIZE_OFS);

and the last one returns 4294967295 (0xffffff). It is wrong because it is needed to take size from additional fields in this case. And then function mz_zip_reader_extract_to_mem_no_alloc returns an error

  if (buf_size < needed_size)
    return mz_zip_set_error(pZip, MZ_ZIP_BUF_TOO_SMALL);

because buf_size is 4294967295, but needed_size is 5998805513.

Call stack is

mz_zip_reader_extract_to_mem_no_alloc(mz_zip_archive * pZip, unsigned int file_index, void * pBuf, unsigned __int64 buf_size, unsigned int flags, void * pUser_read_buf, unsigned __int64 user_read_buf_size) Line 6499
mz_zip_reader_extract_to_mem(mz_zip_archive * pZip, unsigned int file_index, void * pBuf, unsigned __int64 buf_size, unsigned int flags) Line 6631
mz_zip_reader_extract_to_heap(mz_zip_archive * pZip, unsigned int file_index, unsigned __int64 * pSize, unsigned int flags) Line 6669
zip_entry_read(zip_t * zip, void * * buf, unsigned __int64 * bufsize) Line 1445
kuba-- commented 2 years ago

Ok, thanks for trouble shooting - looks like 32bit issue. I'll take a look.

kuba-- commented 2 years ago

@vivkvv - I've upgraded some miniz internals, what should fix your problem. Please take a look PR: https://github.com/kuba--/zip/pull/262 or test your file against the branch: https://github.com/kuba--/zip/tree/fix-258

vivkvv commented 2 years ago

Thanks. I've checked and it works now.