wqweto / ZipArchive

A single-class pure VB6 library for zip with ASM speed
MIT License
52 stars 22 forks source link

Small changes/features to support OpenRaster spec #6

Closed tannerhelland closed 5 years ago

tannerhelland commented 5 years ago

Hi @wqweto. I'm looking at adding OpenRaster file format support to PhotoDemon. The spec is very straightforward, basically constructed around existing OpenDocument rules. cZipArchive makes my job much easier!

I haven't studied everything in great detail, but I notice a few odd requirements in the spec that (perhaps?) are not covered by ZipArchive at present. These are independent from things like Zip64 support, which are a totally different beast - and not what I'd dare ask for, since PhotoDemon couldn't handle 4gb+ files cleanly anyway.

The file names within OpenRaster "zip files" are case-sensitive and must be UTF-8 encoded. If you decide to write non-ascii file names, be aware that the Zip format has an UTF-8 flag for each file name. Make sure this flag is set, otherwise the content might get decoded as cp437.

Apologies if any of these items are already covered and I just didn't review the code carefully enough! TIA

wqweto commented 5 years ago

The required mimetype file must be uncompressed (e.g. STORED in zip parlance). I wonder if ZipArchive could expose a flag for adding individual entries as STORED vs DEFLATED.

That could be an additional Level parameter to AddFile (besides EncrStrength now). Probably a BeforeCompress event to be able to customize name/level/encrstrength/etc would be neat too.

UTF-8 filenames are required by the spec. I'm not sure how much work it is to get/set the UTF-8 flag for zip filenames (or if this is tied to Zip64 format), but the spec mentions that the flag is important

Currently ucsZcfUseUtf8 is handled automatically like this

    uLocal.Flags = IIf(pvFromOemString(pvToOemString(.FileName)) <> .FileName, ucsZcfUseUtf8, 0)

i.e. if the filename is not CP_OEM friendly, switch to utf-8 names. Probably an additional ForceUtf8 parameter to CompressArchive could control this behavior.

As an unrelated convenience, it would be very helpful to retrieve a list of filenames as perhaps a string array, to validate ORA file format correctness before attempting actual extraction of individual files

That is currently possible by first calling OpenArchive to populate the internal array of entries and then accessing properties FileCount and FileInfo(Index) -- the latter returns an array of archive entry properties w/ the first property being the filename.

Will keep you posted when I push these changes as time permits.

wqweto commented 5 years ago

Just implemented these new params in 1bad24267f3c5a2ff732740b0eabb408fa29e04b as described.

Tell me if you have any other troubles with OpenRaster support now.

wqweto commented 5 years ago

FYI, this dumps archive entries filenames

    Const IDX_FILENAME  As Long = 0
    Dim lIdx            As Long
    Dim baXml()         As Byte

    With New cZipArchive
        .OpenArchive "D:\TEMP\framed.ora"
        For lIdx = 0 To .FileCount - 1
            Debug.Print .FileInfo(lIdx)(IDX_FILENAME)
        Next
        .Extract baXml, "stack.xml"
    End With
    With CreateObject("MSXML2.DOMDocument")
        .loadXml FromUtf8Array(baXml)
        Debug.Print .xml
    End With

Public Function FromUtf8Array(baText() As Byte) As String
    Dim lSize           As Long

    If UBound(baText) >= 0 Then
        FromUtf8Array = String$(2 * UBound(baText), 0)
        lSize = MultiByteToWideChar(CP_UTF8, 0, baText(0), UBound(baText) + 1, ByVal StrPtr(FromUtf8Array), Len(FromUtf8Array))
        FromUtf8Array = Left$(FromUtf8Array, lSize)
    End If
End Function

. . . in Immediate Window

mimetype
data\layer000.png
data\layer000_strokemap.dat
data\background.png
data\background_tile.png
Thumbnails\thumbnail.png
stack.xml
<?xml version="1.0"?>
<image h="256" w="384">
    <stack>
        <layer mypaint_strokemap_v2="data/layer000_strokemap.dat" opacity="1.0" src="data/layer000.png" visibility="visible" x="0" y="64"/>
        <layer background_tile="data/background_tile.png" name="background" opacity="1.0" src="data/background.png" visibility="visible" x="-256" y="-64"/>
    </stack>
</image>
tannerhelland commented 5 years ago

👏 thank you, this is perfect! You put the rest of us to shame with your fast updates.

(Apologies for not studying the code and realizing some of these things on my own - I will study in more detail before raising issues in the future.)

This class is a true gem, thank you again for sharing it.

tannerhelland commented 5 years ago

Just popping back in to say that I've finished my little OpenRaster import/export project, and cZipArchive works like a charm. Performance ends up being significantly better than GIMP since cZipArchive allows me to perform all essential tasks in-memory. (By "significantly better" I mean 5-10x faster read/write support on multi-layer test images, very nice to have!)

I have a few ideas for further improvement but as they are not critical to this task, I'll post them later as separate issues. (Makes it easier to close them with "no chance in hell" reasoning attached ;)

Thank you again.

wqweto commented 5 years ago

Glad it helped! 👍

I see you are not using named params for Level of AddFile as in this line

    If Not m_ZipArchive.AddFile(strUtf8, "mimetype", , , , 0) Then . . .

I find named params (optional or not) very convenient to document the literal 0 in such places and save the comma repetition. . . but YMMV.

For optional params I'm strongly advising using named params by fellow devs as this allows adding more optional params not only at the end of param-list but also grouping these more logically closer to each other. I also tend to skip hungarian convention for these to make caller sites more readable.

My point is that when in the future I decide to add more optional params to AddFile these might not come as last in the param-list, be warned.

tannerhelland commented 5 years ago

👍 All great advice. I'll get the code switched to named params and clean up my sloppy variable naming.

Thank you!