zeniko / unarr

read-only mirror of https://github.com/sumatrapdfreader/sumatrapdf/tree/master/ext/unarr
GNU Lesser General Public License v3.0
5 stars 3 forks source link

7z performance #4

Open selmf opened 9 years ago

selmf commented 9 years ago

Hi Zeniko,

it's me again. This time it's not about a bug I encountered or a suggestion regarding Linux/Unix, but rather a technical question. I have been experimenting with adding .7z support to my cmake build script for unarr using the stuff you do for SumatraPDF as a blueprint and I've mostly succeeded on this. However, with my success and my habit of using large files as a test case I have encountered a problem which kind of is a deal-breaker for me. Apparently, the ANSI-C 7z code unarr uses for 7z support first decodes a whole block of the 7z archive before it lets you access all of the files inside this block. On a small archive this isn't a problem, but with larger archives with a size of several hundred megabytes up to a few gigabytes in the extreme case the time till the first page of a comic or the first file of the archive is available is seriously delayed. Since my usecase involves extracting the covers of whole libraries of comics this is a serious problem. Also, I don't feel confident enough in my programming skills to fix this limitation by myself. So my question to you is... do you have any idea on how to proceed on this problem? Is it worth working on at all or is it a lost cause? 7z upstream suggests using the CPP code by the way, but as far as I know it's tied too heavy to the windows API to be of any use to me.

Best Regards,

Selmf

zeniko commented 9 years ago

This is a known issue for 7z archives using solid compression. A proper fix would require reimplementing SzArEx_Extract and all of 7zDec.c so that it behaves the same as unarr's rar uncompressor which only decompresses up to the required file and only keeps uncompression state in memory instead of all uncompressed data.

I'll look into it should I ever get sufficiently bored or annoyed by the limitation (unlikely). Patches would be welcome, though, and I might be able to help a bit with the implementation, should you or anybody else feel up to it.

BTW: Using solid compression is not recommended for comic books anyway, since it can significantly slow down random access to files (unless you're willing to cache a few MBs of uncompression state per file which may also be prohibitive for large archives).