icsharpcode / SharpZipLib

#ziplib is a Zip, GZip, Tar and BZip2 library written entirely in C# for the .NET platform.
http://icsharpcode.github.io/SharpZipLib/
MIT License
3.72k stars 976 forks source link

Store ZipFile entries in dictionary for faster lookup #608

Open bstadick opened 3 years ago

bstadick commented 3 years ago

Steps to reproduce

  1. Create ZipFile instance
  2. Access ZipEntry using FindEntry
  3. Entry is found slowly for large archives and only if name matches verbatim with no option to override string comparison

Expected behavior

Enhancement- ZipEntry is found in a short amount of time and can match using a custom string comparison.

Address the TODO found in ZipFile.cs:765, by storing ZipEntry's in "entries_" variable as a Dictionary<string, ZipEntry> with the ZipEntry.Name as the key to speed up search and indexing at cost of a little more memory usage.

Existing numerical indexing can remain by way of ElementAt LINQ extension or maintaining a separate list of the keys.

Upon creation of the "entries_" dictionary, provide option for IEqualityComparer to allow for non-standard string comparisons when finding entries, as in the case of normalizing path delimiters between file systems.

Actual behavior

It's found but only matches the name verbatim and is really slow in large archives.

Version of SharpZipLib v1.3.1

Obtained from

piksel commented 3 years ago

This can be done using LINQ (here implemented as a ZipFile extension):

public static Dictionary<string, ZipEntry> GetEntryLookup(this ZipFile zipFile, IComparer<string> comparer)
  => zipFile.Cast<ZipEntry>().ToDictionary(e => e.Name, comparer);

as in the case of normalizing path delimiters between file systems

Zip files only use / as path delimiters: https://p1k.se/appnote.md#s4.4.17.1