Closed prot0man closed 8 months ago
I've created an MR into my local fork of this repo to illustrate some changes that may help fix the issue at https://github.com/prot0man/zip/pull/1/commits/228756548f1daf37a436b2269b4521f339ad915f
Though those changes result in zip_entries_delete
successfully deleting the file, I get CRC warnings when I
unzip:
Archive: myoutfile
inflating: file1
file3: mismatching "local" filename (file2),
continuing with "central" filename version
inflating: file3 bad CRC a0c3e8fc (should be d7c4d86a)
Actually, with the changes in my fork, it doesn't actually successfully delete the correct file (file2) based on the hexdump of the resulting zip:
proto@D:~/repos/zip/src/scratch$ xxd myoutfile
00000000: 504b 0304 1400 0808 0800 db64 2c58 0000 PK.........d,X..
00000010: 0000 0000 0000 0000 0000 0500 0400 6669 ..............fi
00000020: 6c65 3101 0000 0001 1400 ebff 736f 6d65 le1.........some
00000030: 7465 7374 6461 7461 666f 7266 696c 6531 testdataforfile1
00000040: 504b 0708 46b9 ca39 1900 0000 0000 0000 PK..F..9........
00000050: 1400 0000 0000 0000 504b 0304 1400 0808 ........PK......
00000060: 0800 db64 2c58 0000 0000 0000 0000 0000 ...d,X..........
00000070: 0000 0500 0400 6669 6c65 3201 0000 0001 ......file2.....
00000080: 1400 ebff 736f 6d65 7465 7374 6461 7461 ....sometestdata
00000090: 666f 7266 696c 6532 504b 0708 fce8 c3a0 forfile2PK......
000000a0: 1900 0000 0000 0000 1400 0000 0000 0000 ................
000000b0: 504b 0102 0000 1400 0808 0800 db64 2c58 PK...........d,X
000000c0: 46b9 ca39 1900 0000 1400 0000 0500 0400 F..9............
000000d0: 0000 0000 0000 0000 0000 0000 0000 6669 ..............fi
000000e0: 6c65 3101 0000 0050 4b01 0200 0014 0008 le1....PK.......
000000f0: 0808 00db 642c 586a d8c4 d719 0000 0014 ....d,Xj........
00000100: 0000 0005 0004 0000 0000 0000 0000 0000 ................
00000110: 0058 0000 0066 696c 6533 0100 0000 504b .X...file3....PK
00000120: 0506 0000 0000 0200 0200 6e00 0000 b000 ..........n.....
00000130: 0000 0000 ....
Unfortunately, deleting entries has some limitations. I means, you cannot open the archive for writing w
and delete entries.
You have to close it, first (zip_close
) and then open for deleting (there is also, just for consistency a special d
mode for deleting).
In other words, the workflow should be like this:
zip_open
{
zip_entry_open
zip_entry_write
zip_entry_close
}
zip_close
zip_open("foo.zip", 0, 'd');
{
zip_entries_delete
// you can also delete by index, instead of by name
// zip_entries_deletebyindex
}
zip_close
look at readme example.
So for my use case, I need to create an in memory zip and potentially remove entries if my archive ever exceeds a size. With my modifications, it almost works on the in memory zip, but the resulting zip clearly has vestigial evidence of the removed entry.
It's important to note that with my example code, I'm able to create an in memory zip and remove an entry (after the changes I have in the linked branch) such that the resulting zip uncompresses to the expected file1 and file3 . If that's API abuse, it may be worth adding some checks for zip_entries_delete that assert that we have the correct mode. I'm going to attempt to see if I can find a clean work-around to enable this use case in my branch though.
I've spent the better part of the day digging in to zip_entries_delete_mark trying to find where the issue might be, but it sounds like what I'm trying to accomplish just shouldn't actually work by design.
As a work-around, I guess I could create a shadow zip, attempt to compress the file in it, and then determine whether the file would put me over the size limit before adding it, but doing that hurts my soul.
On Fri, Jan 12, 2024, 3:39 PM Kuba Podgórski @.***> wrote:
Unfortunately, deleting entries has some limitations. I means, you cannot open the archive for writing w and delete entries. You have to close it, first (zip_close) and then open for deleting (there is also, just for consistency a special d mode for deleting).
In other words, the workflow should be like this:
zip_open { zip_entry_open zip_entry_write zip_entry_close } zip_close
zip_open("foo.zip", 0, 'd'); { zip_entries_delete
// you can also delete by index, instead of by name // zip_entries_deletebyindex
} zip_close
look at readme example.
— Reply to this email directly, view it on GitHub https://github.com/kuba--/zip/issues/330#issuecomment-1889920897, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACGZBJBF657YBZ2WZQ4TRG3YOGNQLAVCNFSM6AAAAABBYM5VEOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBZHEZDAOBZG4 . You are receiving this because you authored the thread.Message ID: @.***>
It's all about calculating index and offsets of entries. It's all has to be written to the central directory, at the end.
Yeah, I was following through zip_central_dir_delete
trying to see if there problem was there. I at least verified that after the call to zip_central_dir_move
in the first while loop, the end result of the zip (pState->m_central_dir.m_p
) didn't contain any references to the deleted file2 entry (via hexdump'ing that buffer). It looks like some other piece of memory is still referencing the deleted entry though.
Generally deleting is kind of experimental feature, so I would be very careful and for sure do not mix it with writing.
It's actually still not clear to me why in zip_entries_delete_mark
, the following is done:
while (i < entry_num) {
while ((i < entry_num) && (entry_mark[i].type == MZ_KEEP)) {
...
}
while ((i < entry_num) && (entry_mark[i].type == MZ_DELETE)) {
...
}
while ((i < entry_num) && (entry_mark[i].type == MZ_MOVE)) {
..
}
...
}
Instead of:
while (i < entry_num) {
if(entry_mark[i].type == MZ_KEEP) {
...
} else if(entry_mark[i].type == MZ_DELETE) {
...
} else if (entry_mark[i].type == MZ_MOVE) {
..
}
...
}
Is it some sort of optimization because doing those actions in bulk via the while loop is more efficient?
After I resolved some of my technical debt with the zip file format, I realized the problem I'm running into is that the central_dir is not in sync with the file headers in some way (probably because the seeks that are done when we're dealing with a file stream need to be mirrored for zip on the heap). I suspect I will be forced to treat the mode 'w' as delete as well or else I'd have to do a disgusting work around that:
I understand that separating the operations for delete and write simplify the API because keeping everything up-to-date for both in-memory zips and file-backed zips can get complex, but are there any other compelling reasons for this design? I'll continue pushing forward in my branch and hopefully come up with a solution that doesn't require a lot of flaming hoops to jump through.
I wonder if the issue is that in zip_close
, you make a call to mz_zip_writer_finalize_archive
, but in zip_stream_close
, there is no analogous call to mz_zip_writer_finalize_heap_archive
.
Nevermind, mz_zip_writer_finalize_heap_archive
is just basically a wrapper on mz_zip_writer_finalize_archive
that additionally copies the data into the passed buffer.
Alright, finally got things working in #332
I plan on doing some more thorough testing of zip_entries_delete when I get time to see if it handles the edge cases I think of.
Thanks! Already merged, so I'm closing the issue.
If you want to add more comments, do not hesitate to reopen it, or create a new one.
It looks like
zip_entries_delete
fails to delete files for in memory stream zips becausepState->m_pFile
is always NULL for them. Some example code to trigger the issue is: