readium / readium-lcp-server

Repository for the Readium LCP Server
BSD 3-Clause "New" or "Revised" License
73 stars 58 forks source link

Should compression level of deflate be -1, not 9? #199

Closed drminside closed 2 years ago

drminside commented 5 years ago

I found that compressed texts in the protected EPUBs provided by edrlab site: have no header information for the deflate compression. (https://www.edrlab.org/readium-lcp/testing-readium-lcp-compliant-devices/)

I think it would be a result that the current go source uses compression level 9 which is assumed 'nowrap' or 'noheader', what we called, gzip type. (https://github.com/readium/readium-lcp-server/blob/master/pack/pack.go#L111-L116) (https://golang.org/pkg/compress/flate/#NewWriter)

If it true, it is totally not compatible with IDPF recommendation. (https://www.w3.org/publishing/epub/epub-ocf.html#sec-zip-container-zipreqs) It requires the header information is necessary for the compressed data, which means it should not use gzip type. (https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT)

So IMO, the compression level for deflate should be -1, not 9.

I also doubt if the readium LCP client on Kotlin and Swift also allows only gzip type for decompression.

danielweck commented 5 years ago

Hello, File resources inside LCP-protected publications are (in this order): 1) compressed (unless image/video codec) 2) encrypted 3) stored (in the OCF container)

The EPUB specification link you provided applies to step (3): https://www.w3.org/publishing/epub/epub-ocf.html#sec-zip-container-zipreqs

This Go code is for step (1): https://github.com/readium/readium-lcp-server/blob/4512f1e60d7d2217d06bad6c7416d2a5d4a9a9a2/pack/pack.go#L113

This Go code is for step (3): https://github.com/readium/readium-lcp-server/blob/4512f1e60d7d2217d06bad6c7416d2a5d4a9a9a2/epub/writer.go#L73-L77 https://github.com/readium/readium-lcp-server/blob/4512f1e60d7d2217d06bad6c7416d2a5d4a9a9a2/epub/writer.go#L89

https://github.com/readium/readium-lcp-server/blob/4512f1e60d7d2217d06bad6c7416d2a5d4a9a9a2/pack/pack.go#L93-L99 https://github.com/readium/readium-lcp-server/blob/4512f1e60d7d2217d06bad6c7416d2a5d4a9a9a2/pack/pack.go#L125 https://github.com/readium/readium-lcp-server/blob/4512f1e60d7d2217d06bad6c7416d2a5d4a9a9a2/epub/writer.go#L43-L48

drminside commented 5 years ago

@danielweck It is all about Readium LCP process, compression-encryption in the server-side and decryption-decompression in the client-side.

The conclusion is that it's our fault. There is no problem in the Readium source. I am sorry for confusing you. So please don't mind the first question of #199.

But I am going to explain what the problem was on our server. Because I think it would happen in other implementors.

We found that an LCP protected EPUB packaged by our server is not open in the R2 Reader. We verified that the passphrase is correct and decryption process is also successfully done. But R2 reader could not decompress the compressed resource.

Now we have known the reason. We used the zlib for the deflate compression. We realized that the compression result of the zlib is not compatible with the specification of zip. (https://www.w3.org/publishing/epub/epub-ocf.html#sec-zip-container-zipreqs) It adds a header and checksum field into a pure deflated result, which is called as zlib wrap.

As a result, the compressed data with zlib wrap generated by the zlib is not opened in the R2 reader. Because R2 reader may allow only pkzip or gzip deflate compression.

We will update our server for the correct deflate compression according to the zip specification.

llemeurfr commented 5 years ago

It's interesting to see the zlib does not create zip compatible resources. But Can you pinpoint which requirement of the zip specification is not fulfilled by zlib?

It adds a header and checksum field into a pure deflated result, which is called as zlib wrap.

Is it the presence of this additional zlib wrap which causes the issue? because in this case, the issue happens also with unencrypted deflated content, isn't it?

drminside commented 5 years ago

I saw it at the manual of Class Inflater in the java.util.zip

public Inflater(boolean nowrap)

Creates a new decompressor. If the parameter 'nowrap' is true then the ZLIB header and checksum fields will not be used. This provides compatibility with the compression format used by both GZIP and PKZIP.

Parameters:

nowrap - if true then support GZIP compatible compression

And the zip specification does not mention about the header and checksum of zlib.

I think it could happen with plain EPUB if the text resource is compressed by zlib. But technically it is not compatible with IDPF EPUB specification.

BTW interesting thing is zlib can inflate both gzip and zlib deflated ones.

It is the reason why we didn't find the problem by now. Our reader application using zlib has no problem to open both EPUBs protected by EDRLab's and DRM inside's server.

And our server using zlib has been reading the gzip compressed resource in the plain EPUB successfully, deflating the resource with zlib wrap and encrypting to make protected EPUB.