kuba-- / zip

A portable, simple zip library written in C
MIT License
1.39k stars 272 forks source link

Added test for delete from in-memory zip. #333

Open prot0man opened 8 months ago

prot0man commented 8 months ago

This required making some updates that are kind of hacky. The path of least edits to miniz necessitated updating miniz mz_zip_reader_init_mem to initialize the writer function and additionally zip_stream_open to handle the case where a user passes a stream and the mode 'r' or 'd'.

prot0man commented 8 months ago

I'm sure that modifying miniz is something that we only want to ever do as a last result to fix critical errors, but what are your thoughts on these modifications to enable more rich support of in memory zips?

prot0man commented 8 months ago

Also, I have no idea why ubuntu is failing on the runner when it works for me on ubuntu when I do:


proto@D:~/repos/zip$ cmake --build build --target test
Running tests...
/usr/bin/ctest --force-new-ctest-process
Test project /home/proto/repos/zip/build
    Start 1: test_write.out
1/7 Test #1: test_write.out ...................   Passed    0.00 sec
    Start 2: test_append.out
2/7 Test #2: test_append.out ..................   Passed    0.00 sec
    Start 3: test_read.out
3/7 Test #3: test_read.out ....................   Passed    0.00 sec
    Start 4: test_extract.out
4/7 Test #4: test_extract.out .................   Passed    0.00 sec
    Start 5: test_entry.out
5/7 Test #5: test_entry.out ...................   Passed    0.00 sec
    Start 6: test_permissions.out
6/7 Test #6: test_permissions.out .............   Passed    0.00 sec
    Start 7: test_open.out
7/7 Test #7: test_open.out ....................   Passed    0.00 sec

100% tests passed, 0 tests failed out of 7

Total Test time (real) =   0.01 sec
proto@D:~/repos/zip$ git branch
* 330_steam_zip_add_tests
* ````
prot0man commented 8 months ago

Also, it has been a nightmare trying to update the CMake config to properly toggle output on failure for the tests. The ideal solution would be to add:

list(APPEND CMAKE_CTEST_ARGUMENTS "--output-on-failure")

to the CMakeLists.txt, but that was only introduced in cmake 3.17 (current cmake minimum in your project is 3.14).

Another way to achieve it is to create a new test target called check where you add the option to toggle output on failure like:

add_custom_target(check ${CMAKE_COMMAND} -E env CTEST_OUTPUT_ON_FAILURE=1
                  ${CMAKE_CTEST_COMMAND} -C $<CONFIG> --verbose
                  WORKING_DIRECTORY ${CMAKE_BINARY_DIR})

Problem is that the pipeline currently does a make test for testing instead of make check.

kuba-- commented 8 months ago

Also, it has been a nightmare trying to update the CMake config to properly toggle output on failure for the tests. The ideal solution would be to add:

list(APPEND CMAKE_CTEST_ARGUMENTS "--output-on-failure")

to the CMakeLists.txt, but that was only introduced in cmake 3.17 (current cmake minimum in your project is 3.14).

Another way to achieve it is to create a new test target called check where you add the option to toggle output on failure like:

add_custom_target(check ${CMAKE_COMMAND} -E env CTEST_OUTPUT_ON_FAILURE=1
                  ${CMAKE_CTEST_COMMAND} -C $<CONFIG> --verbose
                  WORKING_DIRECTORY ${CMAKE_BINARY_DIR})

Problem is that the pipeline currently does a make test for testing instead of make check.

I've added more verbose output for ctest, so PTAL. Generally it fails because of ASAN (address sanitizer), if you run tests without ASan it looks like it works, but on CI we check memory leaks, etc. If you want to reproduce it locally, try running your tests with:

cmake -DSANITIZE_ADDRESS=On -S . -B build -GNinja
cmake --build build
cd build

ASAN_OPTIONS=detect_leaks=0 LSAN_OPTIONS=verbosity=1:log_threads=1 CTEST_OUTPUT_ON_FAILURE=1 
ctest -VV --output-on-failure
kuba-- commented 8 months ago

I'm sure that modifying miniz is something that we only want to ever do as a last result to fix critical errors, but what are your thoughts on these modifications to enable more rich support of in memory zips?

Yes, we should not change miniz (very often), because it's always tricky to upgrade and propagate changes without breaking anything. In this case, the change is small, so it's not a big deal, but we have to remember it.

Regarding new features - in my personal opinion the library is in maintenance mode (I do not see any need for new features), but people use it in many different ways, so I'm always open for PRs.

prot0man commented 8 months ago

Also, it has been a nightmare trying to update the CMake config to properly toggle output on failure for the tests. The ideal solution would be to add:

list(APPEND CMAKE_CTEST_ARGUMENTS "--output-on-failure")

to the CMakeLists.txt, but that was only introduced in cmake 3.17 (current cmake minimum in your project is 3.14). Another way to achieve it is to create a new test target called check where you add the option to toggle output on failure like:

add_custom_target(check ${CMAKE_COMMAND} -E env CTEST_OUTPUT_ON_FAILURE=1
                  ${CMAKE_CTEST_COMMAND} -C $<CONFIG> --verbose
                  WORKING_DIRECTORY ${CMAKE_BINARY_DIR})

Problem is that the pipeline currently does a make test for testing instead of make check.

I've added more verbose output for ctest, so PTAL. Generally it fails because of ASAN (address sanitizer), if you run tests without ASan it looks like it works, but on CI we check memory leaks, etc. If you want to reproduce it locally, try running your tests with:

cmake -DSANITIZE_ADDRESS=On -S . -B build -GNinja
cmake --build build
cd build

ASAN_OPTIONS=detect_leaks=0 LSAN_OPTIONS=verbosity=1:log_threads=1 CTEST_OUTPUT_ON_FAILURE=1 
ctest -VV --output-on-failure

Thank you for the output on error update! I see now that what I thought would prevent the stream from being freed didn't actually do that. I'll look into it now.

prot0man commented 8 months ago

So before we merge this, I should rebase the fixup and add in an additional commit that updates zip.h to document the weird behavior that zip_stream_close frees the original stream passed in if it's not NULL. Is that too disgusting of a way to do this?

Because of the reallocs that are performed on the stream in miniz, I think it's safest if the user assumes that buffer is just going to be managed by the zip library at that point.

prot0man commented 8 months ago

Went ahead and rebased and documented the wonky behavior.

prot0man commented 8 months ago

This is more of a difference between read and write though. Like if you're only opening with read permissions, then the stream pointer will definitely never be modified. But for write, it'll obviously be mutated AND you could add a file big enough that necessitates moving all of the data to a new chunk of memory. Delete would also mutate the stream, but it wouldn't do so in a way that would result in needing a realloc.

To me, it still doesn't make as much sense to separate write and delete because they're both mutating the stream. I know there's a little more complexity required to properly shift the entries, but it's very cumbersome to have to redo the same workflow to toggle between write and delete. At a minimum, that could result about 7 API function calls with error checking for each workflow. Also, zip_stream_open taking a const char *stream for the workflow where you're doing writes (even though the pointer itself isn't changed) definitely doesn't make sense given the realloc involved.

After being forced to dig under the hood of this library, it's become clear to me how naive (and potentially incomplete) my changes are in this MR. I plan on making more comprehensive changes in my fork when I have time. If you decide that it's easier to limit actions to read/write/delete per open for simplicity, I do understand though.

On another note, as I've been looking more at the miniz part of this library, I've found several code quality issues and things that look a little broken. Outside of all of the monolith functions that make me wonder how they properly tested any of the units, I have found things like in mz_zip_writer_add_mem_ex_v2 where at https://github.com/kuba--/zip/blob/4696e964a6c66b46a6469566d81145993d4a7c5e/src/miniz.h#L8749 they only ever use the MZ_ZIP_DATA_DESCRIPTER_SIZE64 regardless of whether the m_zip64 is true.

On Mon, Jan 22, 2024, 6:19 AM Kuba Podgórski @.***> wrote:

@.**** commented on this pull request.

In src/zip.h https://github.com/kuba--/zip/pull/333#discussion_r1461716537:

@@ -494,6 +495,9 @@ extern ZIP_EXPORT ssize_t zip_stream_copy(struct zip_t *zip, void buf, /

The realloc may happen any time someone passes a stream to zip_open_stream and then adds a file to the zip that was big enough that the realloc moved the memory that stream originally pointed to to another address.

Ok, I understand what you mean. That's why to kept workflows safe and separated delete mode from write/append. Frankly speaking, I'm not super convinced that it's worth to redesign the API, or copy the stream, ... Personally, I do not think it's a big deal to open the stream twice. First time for delete, the second time for write/append.

— Reply to this email directly, view it on GitHub https://github.com/kuba--/zip/pull/333#discussion_r1461716537, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACGZBJBEQQNJKS57B5C6E6TYPZDKPAVCNFSM6AAAAABB32JXT6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTQMZWGIZTGNJXGY . You are receiving this because you authored the thread.Message ID: @.***>

kuba-- commented 8 months ago

This is more of a difference between read and write though. Like if you're only opening with read permissions, ... To me, it still doesn't make as much sense to separate write and delete because they're both mutating the stream.

Yes, both operations mutate the same stream, but mutate in totally different way (especially from zip structure perspective). In other words, they mutate the same critical section, and that's why it can be risky (that's why I mentioned that separated operations are more predictable from API perspective).

If you decide that it's easier to limit actions to read/write/delete per open for simplicity, I do understand though. ...but as I said at the beginning - any PR is welcome to make this library better, so if you can fix it and keep it clean and safe, then would be great.