zip-rs / zip-old

Zip implementation in Rust
MIT License
731 stars 204 forks source link

Deletion of zip entries in-place #334

Closed baloo closed 1 year ago

baloo commented 1 year ago

fixes zip-rs/zip2#166

This brings the changes made by @rushiiMachine to support in-place deletion.

I tried my best to keep original authorship, but I fixed a couple tests, documentation and formatting. I'm also adding a test for file deletion on top of @rushiiMachine 's work.

rushiiMachine commented 1 year ago

Probably should add a test for delete with fill_void Also, I didn't bother adding but you should probably replace the file name with a placeholder (maintaining equal byte size) when deleting with fill_void

baloo commented 1 year ago

If I read the code correctly, which I may not. When we remove with fill void, we will remove the file from the list for files (the header). And on finalize() we will completely rewrite the central directory header. I'm not sure that is necessary to replace the filename. So I don't think it's necessary to overwrite this with a placeholder, or that we actually want that.

rushiiMachine commented 1 year ago

I'm not sure that is necessary to replace the filename

The point of this https://github.com/zip-rs/zip/pull/334/files#diff-8e31462ea4b11fdadc15fd98e895c1902595d9d34947c58eecca1125a5e6224dR902-R905 is to not retain potentially sensitive entry data. The same logic could be applied to filename, which could be sensitive too. It should be erased so that it doesn't leak.

Plecra commented 1 year ago

Heya! Thankyou for sending this in. As rushii mentioned before, this API isn't appropriate for the public API of the crate :) If the goal is to overwrite the space in a zip archive, recreating the archive from scratch is more suitable.

@rushiiMachine I see one of your commits is fixing a corruption of the archive. Can you remember what got corrupted before those changes? There's a drastic difference between removing the central header for a file and the new implementation which seems to overwrite the rest of the contents of the archive.

As for how I'd want this API to look, I'd currently expect the additions to look like this:

/// Anonymizes all data of the file currently named `name`
///
/// All of file contents, file name and metadata such as creation date or permission flags are removed from the file.
/// In its place, we leave an empty file. Keep in mind that the bleached file will retain the length of the original.
pub fn bleach_file(&mut self, name: impl Into<String>) -> ZipResult<()>;
/// Removes `name` from the archive's file listing.
///
/// > Warning: The file will remain intact within the archive, and can be recovered by anyone who wants to.
///   This prevents it from being read by a typical ZIP unpackers or viewers.
///   To remove the data contained in the file, use [`ZipWriter::bleach_file`] before this method.
///
/// This can be used to cheaply remove a file from the archive. This does not write to the archive until [`ZipWriter::finalize`] is called.
pub fn forget_file(&mut self, name: impl Into<String>) -> ZipResult<()>;

Say if you think I'm missing something here, or you think any of it should be named differently! It's nothing more than a sketch from my memory of how I intended to implement this in november.

rushiiMachine commented 1 year ago

I see one of your commits is fixing a corruption of the archive.

If I remember correctly, it was this line here that was causing the issue, I didn't really bother finding the root cause but from what I understood it wrote local file header even when there was no file being written (I made finish_file possible to be called at a different time by adding delete_file)

baloo commented 1 year ago

Heya! Thankyou for sending this in. As rushii mentioned before, this API isn't appropriate for the public API of the crate :) If the goal is to overwrite the space in a zip archive, recreating the archive from scratch is more suitable.

I understood @rushiimachine 's comment as "I could not implement that with the current public facing API, so I could not use a trait to augment/provide the feature", more than this should not be public facing. I could be wrong.

baloo commented 1 year ago

(anyway, I'll try to put some effort to have the forget_file and bleach_file the way you suggest)

Plecra commented 1 year ago

I'm expecting to implement this myself (lets say by February 19th and put the pressure on ;)), but if you'd like to return to it before then I'm happy to work through it together!

baloo commented 1 year ago

I got pulled in side projects, and couldn't get back to that. I don't know if I'll have the time before Feb 19th, but I'd be happy to test your changes if necessary.

Plecra commented 1 year ago

That's perfectly reasonable :) Thanks for putting your time into this, and I'll definitely ping you to see how the new implementation suits you!