haf / DotNetZip.Semverd

Please use System.IO.Compression! A fork of the DotNetZip project without signing with a solution that compiles cleanly. This project aims to follow semver to avoid versioning conflicts. DotNetZip is a FAST, FREE class library and toolset for manipulating zip files. Use VB, C# or any .NET language to easily create, extract, or update zip files.
Other
545 stars 218 forks source link

Plans to fix performance issues when zipping large directories with strong encryption using the same password? #271

Open AndrewBSzabo opened 1 year ago

AndrewBSzabo commented 1 year ago

I did some performance testing, and it seems like the performance when using aes256 to zip a large folder is pretty slow (took ~45 seconds to zip an 86.9MB folder with 2282 files as opposed to using PkzipWeak where the same folder only took ~7 seconds to zip). Looking at the code it seems that it is calculating the encryption key for each file (in ZipEntry.WriteSecurityMetadata): image

but since we are using the same password for the entire folder can't we calculate it once in the ZipFile object and then each time we add a ZipEntry we dont need to call WinZipAesCrypto.Generate (like we do in the image above), and instead can just add the encryption key information to the header?

jshergal commented 1 year ago

The reason for the "performance hit" is due to the fact that from a security perspective an independent random salt needs to be computed for each file that is being encrypted and so while the password is the same, the salt must be different. See more here: WinZip Encryption Salt

jshergal commented 1 year ago

@AndrewBSzabo - In looking through the WinZipAesCrypto source file, there are definitely some areas where the performance could still be really improved, even without the change to reuse the generated key. With that in mind, I am going to reopen this issue and mark it as up for grabs. If you would like to work on it, you are more than welcome to do that. One easy win would be just modifying the Generate function to not create a new instance of Random every time it is called. The class itself could just have one static instance and reuse it (and just as a side note, it might be a big win to look through the whole source code and see if that could be pulled out to use a thread safe static Random as well. Bear in mind that Random is not thread safe, so you would want to look at implementing something similar to this SO answer. I think that change alone would boost the performance in the WinZipAesCrypto.Generate function. I believe there may be some other gains that can be made in that class as well.

AndrewBSzabo commented 1 year ago

Sounds good. I will start looking into this!