wqweto / ZipArchive

A single-class pure VB6 library for zip with ASM speed
MIT License
55 stars 23 forks source link

cZipArchive.Extract: support OutputTarget as vbLong, treat as naked pointer #8

Closed tannerhelland closed 5 years ago

tannerhelland commented 5 years ago

This is an esoteric request, so feel free to disregard and I will simply muck up my own cZipArchive copy to implement. ;)

Because large array allocations are especially expensive in VB, I would prefer to simply reuse a single large destination buffer when extracting a series of sequential files. This is especially useful in my OpenRaster use-case, as I just need to extract PNG bytes via cZipArchive, then immediately decode it into a usable image buffer (and discard the original PNG stream). It would save me a ton of allocation time to take the approach of, "find largest extracted size, prep single buffer at that size, re-use for all PNG extractions" vs the current approach of allocating new memory for each temporary Extract() output buffer.

Obviously, cZipArchive supporting a naked long as extraction target would require some amount of trust that the caller has properly read .Size property and allocated accordingly. Not sure if this is desirable or not in a class designed for general use.

An alternate implementation could be a new optional parameter in Extract(), something to the effect of "only resize OutputTarget array if it's too small for this extraction", but this adds complexity on cZipArchive's end and the same problems persist - e.g. the caller still needs to confirm .Size to understand the output. That's why I thought this implementation (passing a naked pointer) may be preferable.

(Actually there are other use-cases too, e.g. the caller could pass a pointer to something like a memory-mapped file view, which otherwise would require extra steps to "wrap" that bare pointer in a VB array.)

wqweto commented 5 years ago

As far as I understand you want to skip output array resizing at line 1299

                If IsArray(OutputTarget) And .Size > 0 Then
                    ReDim OutputTarget(0 To .Size - 1) As Byte
                End If

Another strategy altogether is to implement VFS stream over your oversized buffer with naked pointer arithmetics and all. Check out sample cMemoryStream.cls for all the VszXxx function-prototypes (below VFS interface comment) as expected to be impl by your stream class.

For OutputTarget I suppose you can even skip VfsReadFile impl as it will never be called from Extract. Copy VfsSetFilePointer and VfsSetEndOfFile verbatim from cMemoryStream and VfsWriteFile boils down to a single CopyMemory call and m_lPosition adjustment.

Edit: You can check out cBufferStream since https://github.com/wqweto/ZipArchive/commit/da5230028e8501ae751ec2d7095dccf6955a3f0e for sample impl of the idea above.

tannerhelland commented 5 years ago

Many thanks, @wqweto. I didn't expect you to provide a workable sample for me, but your generosity is always a welcome surprise!

(Mostly I find it a little sad when people fork a repo and change many things, but never notify the original code owner of those changes, just in case they find something useful/interesting in it. Of course when the changes are low-quality and potentially dangerous like mine often are, maybe best not mention them... ;)

I will migrate this change to my own copy soon to try and eke a few more ms gain out of loading very large OpenRaster archives. Thank you again.

tannerhelland commented 5 years ago

Apologies again for taking so long to revisit this.

After many rounds of perf-testing, the easiest solution proved to be just bypassing the need for .Extract() calls when possible. Most OpenRaster exporters write uncompressed layer PNGs to the zip container - a smart decision, since PNGs are already 90+% deflate'd, and another deflate pass on top gains little but takes forever.

So I just hacked together a small "peek"-like function to return the file offset, compression state, and compressed/original size of individual archive entries. If PNG files are uncompressed in the archive, I can just point the PNG decoder at their offset and parse them directly from disk, skipping the Extract stage entirely.

For the few bad ORA writers who do store deflated PNGs, existing .Extract() function still works fine! I may still drop in the new cBufferStream implementation you so helpfully provided, but profiling shows that time spent in inflate() far outweighs time spent in safearray allocation, so I probably fell victim to "premature optimization is the root of all evil" by targeting allocation perf first. (shrugs sheepishly)

Thank you again for the new example code in cBufferStream. I hope others find it useful too!