zlib-ng / minizip-ng

Fork of the popular zip manipulation library found in the zlib distribution.
Other
1.22k stars 429 forks source link

When running on some larger zip files, in CreateProcessA(...), an exe with minizip gets bad file index #594

Closed h3rb-ts closed 2 years ago

h3rb-ts commented 3 years ago

FYI I cannot share this file, but it is a zip file, using DEFLATE and STORE. Not sure if this is the place to mention. Seems like a bug in minizip somewheres. Behavior is on Windows. File size is 6GB, and is a "3tz" file.

When running from shell, our exe using minizip steps through the archive.
To do so it relies on minizip exclusively. Based on a demo snippet, it first reports number of entries.

Minizip, when we run our EXE from CMD, reports 40976 files (correct number when the file is inspected in p7zip or winrar) and all are unique.

When the same EXE is invoked via CreateProcess() (winapi) it reports 48578 files but has duplicates in the index and is missing other files.

We have tried invoking this:

  1. from Laragon's PHP's popen (uses CMD),
  2. from Laragon's PHP's popen using CMD wrapping a powershell.exe invocation,
  3. from inside a C++ app in Visual Studio using CreateProcess:
  4. --> invoked both inside PHP's popen, inside CMD and powershell, then calling a simple C++ Windows Console App that uses CreateProcess()
  5. --> and outside of PHP, in the Visual Studio debugger,
  6. --> and from a release version of the C++ runner executed from the command line manually and all have given bad index list / number of entries.

The only time it gives the right answer is when we manually run it from the CMD prompt. In all of the above versions that failed, one commonality is that the process has no window/console.

h3rb-ts commented 3 years ago

Actually, this can be reproduced on some machines by invoking the exe that uses minizip from Python 3.9.6 subprocess.run(...)

nmoinvaz commented 3 years ago

Are you certain it is not a working directory issue - same zip file name in two working directories but with different attributes?

h3rb-ts commented 3 years ago

We've actually been testing this quite extensively. Here's what we've found on Windows:

h3rb-ts commented 3 years ago

Would WITH_STRICT_DEFLATE help?

nmoinvaz commented 3 years ago

Is it having to rebuild the central directory in the instance where it returns the wrong number of files? Is it able to find the central directory in that instance? If I were you I would run through the debugger and see what is happening and why it is returning that value. Are you using mz_zip_reader/mz_zip_writer or just mz_zip class?

h3rb-ts commented 3 years ago

This was ripped from minizip.c's list function. We modified that and it is called very early in our EXE. I am thinking maybe a rogue DLL or other lib is colliding somehow.

h3rb-ts commented 2 years ago

We're still dealing with this, but on another machine Python is able to run it without an issue but this same machine cannot run it in another context without issues .. should we be using zlib-ng ? We worry that it is zlib's fault. Maybe it is something else. We'll try to identify "when" by stepping through, the entries number gets mucked.

h3rb-ts commented 2 years ago

This appears to be a data corruption issue unrelated to minizip! Thanks.