Closed llemeurfr closed 4 years ago
Sure, but that's because the cover image is not encrypted, right?
Laurent, I downloaded your accessible-epub3.lcpl
from the EDRLab prod fontend:
zipinfo accessible-epub3.lcpl.epub
=>
Zip file size: 4107970 bytes, number of entries: 41
...
-rw-rw-r-- 6.3 unx 820269 bl defN 20-Jan-13 16:37 EPUB/covers/9781449328030_lrg.jpg
...
41 files, 4353815 bytes uncompressed, 4102536 bytes compressed: 5.8%
zipinfo -v accessible-epub3.lcpl.epub
=>
Central directory entry #31:
---------------------------
There are an extra 16 bytes preceding this file.
EPUB/covers/9781449328030_lrg.jpg
offset of local header from start of archive: 3485498
(0000000000352F3Ah) bytes
file system or operating system of origin: Unix
version of encoding software: 6.3
minimum file system compatibility required: MS-DOS, OS/2 or NT FAT
minimum software version required to extract: 2.0
compression method: deflated
compression sub-type (deflation): normal
file security status: not encrypted
extended local header: yes
file last modified on (DOS date/time): 2020 Jan 13 16:37:32
32-bit CRC value (hex): 29ed8392
compressed size: 607003 bytes
uncompressed size: 820269 bytes
length of filename: 33 characters
length of extra field: 0 bytes
length of file comment: 0 characters
disk number on which file begins: disk 1
apparent file type: binary
Unix file attributes (100664 octal): -rw-rw-r--
MS-DOS file attributes (00 hex): none
There is no file comment.
However, there seems to be a problem with the encrypted files: they are deflated in the zip directory, instead they should be stored (just like the first entry mimetype
).
Yep, HTML files are compressed / deflated in the zip directory, which results in larger size! (due to padding). For example:
Central directory entry #36:
---------------------------
There are an extra 16 bytes preceding this file.
EPUB/pr01s04.xhtml
offset of local header from start of archive: 4099690
(00000000003E8E6Ah) bytes
file system or operating system of origin: Unix
version of encoding software: 6.3
minimum file system compatibility required: MS-DOS, OS/2 or NT FAT
minimum software version required to extract: 2.0
compression method: deflated
compression sub-type (deflation): normal
file security status: not encrypted
extended local header: yes
file last modified on (DOS date/time): 2020 Jan 13 16:37:32
32-bit CRC value (hex): 9ac371eb
compressed size: 885 bytes
uncompressed size: 880 bytes
length of filename: 18 characters
length of extra field: 0 bytes
length of file comment: 0 characters
disk number on which file begins: disk 1
apparent file type: binary
Unix file attributes (100664 octal): -rw-rw-r--
MS-DOS file attributes (00 hex): none
There is no file comment.
Shouldn't the cover image be stored (and not deflated) in the zip because deflating it does not provide any benefit regarding size? and this independently of the fact that it is not encrypted.
Re. encrypted Codec files, I see then properly stored (not deflated) in the encrypted EPUBs generated by the LCP server.
For the HTML files that are larger deflated than stored, this is an edge case triggered on small files (880 bytes here). We can live with that IMO.
I think that the culprit is in the encrypt
function:
...should be
file.StorageMethod = zip.NoCompression | 0
(i.e. never zip.Deflate | 8
)
...which is passed on AddResource
of epub.Writer
:
https://github.com/readium/readium-lcp-server/blob/a57a6e23294b05e9737f71df722745113aeb2da0/epub/writer.go#L43
Also, this call to w.AddResource
in writer.go
incorrectly uses res.StorageMethod
, I think:
https://github.com/readium/readium-lcp-server/blob/a57a6e23294b05e9737f71df722745113aeb2da0/epub/writer.go#L89
...instead, we should test if the ZIP entry is encrypted, and use store
instead of deflate
:
if encryption != nil {
if data, ok := encryption.DataForFile(file.Name); ok {
// use zip.NoCompression (0) instead of res.StorageMethod
}
}
Shouldn't the cover image be stored (and not deflated) in the zip because deflating it does not provide any benefit regarding size? and this independently of the fact that it is not encrypted.
Snippet from the full zipinfo
I provided in a previous comment:
EPUB/covers/9781449328030_lrg.jpg
compressed size: 607003 bytes
uncompressed size: 820269 bytes
Re. encrypted Codec files, I see then properly stored (not deflated) in the encrypted EPUBs generated by the LCP server.
Not with the current EDRLab LCP prod frontend, it seems:
zipinfo accessible-epub3.lcpl.epub
=>
Archive: accessible-epub3.lcpl.epub
Zip file size: 4107970 bytes, number of entries: 41
-rw-rw-r-- 6.3 unx 20 bl stor 20-Jan-13 16:37 mimetype
-rw-rw-r-- 6.3 unx 9296 bl defN 20-Jan-13 16:37 EPUB/ch03s02.xhtml
-rw-rw-r-- 6.3 unx 5376 bl defN 20-Jan-13 16:37 EPUB/ch03.xhtml
-rw-rw-r-- 6.3 unx 109856 bl defN 20-Jan-13 16:37 EPUB/fonts/UbuntuMono-B.ttf
-rw-rw-r-- 6.3 unx 129472 bl defN 20-Jan-13 16:37 EPUB/fonts/UbuntuMono-BI.ttf
-rw-rw-r-- 6.3 unx 206928 bl defN 20-Jan-13 16:37 EPUB/fonts/FreeSansBold.otf
-rw-rw-r-- 6.3 unx 116528 bl defN 20-Jan-13 16:37 EPUB/fonts/UbuntuMono-RI.ttf
-rw-rw-r-- 6.3 unx 1284512 bl defN 20-Jan-13 16:37 EPUB/fonts/FreeSerif.otf
-rw-rw-r-- 6.3 unx 114208 bl defN 20-Jan-13 16:37 EPUB/fonts/UbuntuMono-R.ttf
-rw-rw-r-- 6.3 unx 2864 bl defN 20-Jan-13 16:37 EPUB/ch03s04.xhtml
-rw-rw-r-- 6.3 unx 10656 bl defN 20-Jan-13 16:37 EPUB/ch03s05.xhtml
-rw-rw-r-- 6.3 unx 4160 bl defN 20-Jan-13 16:37 EPUB/ch02s02.xhtml
-rw-rw-r-- 6.3 unx 112 bl defN 20-Jan-13 16:37 EPUB/css/synth.css
-rw-rw-r-- 6.3 unx 4928 bl defN 20-Jan-13 16:37 EPUB/css/epub.css
-rw-rw-r-- 6.3 unx 1131440 bl defN 20-Jan-13 16:37 EPUB/images/web/epub3_0401.png
-rw-rw-r-- 6.3 unx 302576 bl defN 20-Jan-13 16:37 EPUB/images/spi_global_ad.png
-rw-rw-r-- 6.3 unx 13872 bl defN 20-Jan-13 16:37 EPUB/ch03s03.xhtml
-rw-rw-r-- 6.3 unx 4890 bl defN 20-Jan-13 16:37 EPUB/package.opf
-rw-rw-r-- 6.3 unx 4187 bl defN 20-Jan-13 16:37 EPUB/bk01-toc.xhtml
-rw-rw-r-- 6.3 unx 304 bl defN 20-Jan-13 16:37 EPUB/co01.xhtml
-rw-rw-r-- 6.3 unx 304 bl defN 20-Jan-13 16:37 EPUB/cover.xhtml
-rw-rw-r-- 6.3 unx 1152 bl defN 20-Jan-13 16:37 EPUB/index.xhtml
-rw-rw-r-- 6.3 unx 21632 bl defN 20-Jan-13 16:37 EPUB/ch02.xhtml
-rw-rw-r-- 6.3 unx 736 bl defN 20-Jan-13 16:37 EPUB/pr01s05.xhtml
-rw-rw-r-- 6.3 unx 2096 bl defN 20-Jan-13 16:37 EPUB/ch04.xhtml
-rw-rw-r-- 6.3 unx 288 bl defN 20-Jan-13 16:37 EPUB/spi-ad.xhtml
-rw-rw-r-- 6.3 unx 1872 bl defN 20-Jan-13 16:37 EPUB/pr01.xhtml
-rw-rw-r-- 6.3 unx 2992 bl defN 20-Jan-13 16:37 EPUB/ch01.xhtml
-rw-rw-r-- 6.3 unx 880 bl defN 20-Jan-13 16:37 EPUB/pr01s02.xhtml
-rw-rw-r-- 6.3 unx 880 bl defN 20-Jan-13 16:37 EPUB/pr01s03.xhtml
-rw-rw-r-- 6.3 unx 820269 bl defN 20-Jan-13 16:37 EPUB/covers/9781449328030_lrg.jpg
-rw-rw-r-- 6.3 unx 3456 bl defN 20-Jan-13 16:37 EPUB/ch02s03.xhtml
-rw-rw-r-- 6.3 unx 224 bl defN 20-Jan-13 16:37 EPUB/lexicon/en.pls
-rw-rw-r-- 6.3 unx 208 bl defN 20-Jan-13 16:37 EPUB/lexicon/fr.pls
-rw-rw-r-- 6.3 unx 2944 bl defN 20-Jan-13 16:37 EPUB/ch01s02.xhtml
-rw-rw-r-- 6.3 unx 880 bl defN 20-Jan-13 16:37 EPUB/pr01s04.xhtml
-rw-rw-r-- 6.3 unx 1216 bl defN 20-Jan-13 16:37 EPUB/ch03s06.xhtml
-rw-rw-r-- 6.3 unx 263 bl defN 20-Jan-13 16:37 META-INF/container.xml
-rw-rw-r-- 6.3 unx 62 bl defN 20-Jan-13 16:37 META-INF/calibre_bookmarks.txt
-rw-r--r-- 6.3 unx 2676 bl defN 20-Jan-13 16:36 META-INF/license.lcpl
-rw-rw-r-- 6.3 unx 32600 bl defN 20-Jan-13 16:37 META-INF/encryption.xml
For the HTML files that are larger deflated than stored, this is an edge case triggered on small files (880 bytes here). We can live with that IMO.
The fact that the current LCP server Go implementation incorrectly deflates encrypted entries in the zip directory impacts audio/video performance unnecessarily. In fact, there is a penalty for large HTML or CSS files too, when reading a ZIP entry: inflate
+ decrypt
+ inflate
.
ah, if there is a 25% win in size, we can let it deflated and close this issue.
Re. what you get in the file, I don't see that in my instance of the lcp server:
zipinfo accepub.zip
Archive: accepub.zip
Zip file size: 4106104 bytes, number of entries: 39
-rw---- 2.0 fat 20 bl stor 80-000-00 00:00 mimetype
-rw---- 2.0 fat 4284 bl defN 80-000-00 00:00 EPUB/bk01-toc.xhtml
-rw---- 2.0 fat 2992 bl stor 80-000-00 00:00 EPUB/ch01.xhtml
-rw---- 2.0 fat 2944 bl stor 80-000-00 00:00 EPUB/ch01s02.xhtml
-rw---- 2.0 fat 21712 bl stor 80-000-00 00:00 EPUB/ch02.xhtml
-rw---- 2.0 fat 4176 bl stor 80-000-00 00:00 EPUB/ch02s02.xhtml
-rw---- 2.0 fat 3472 bl stor 80-000-00 00:00 EPUB/ch02s03.xhtml
-rw---- 2.0 fat 5392 bl stor 80-000-00 00:00 EPUB/ch03.xhtml
-rw---- 2.0 fat 9328 bl stor 80-000-00 00:00 EPUB/ch03s02.xhtml
-rw---- 2.0 fat 13920 bl stor 80-000-00 00:00 EPUB/ch03s03.xhtml
-rw---- 2.0 fat 2880 bl stor 80-000-00 00:00 EPUB/ch03s04.xhtml
-rw---- 2.0 fat 10688 bl stor 80-000-00 00:00 EPUB/ch03s05.xhtml
-rw---- 2.0 fat 1216 bl stor 80-000-00 00:00 EPUB/ch03s06.xhtml
-rw---- 2.0 fat 2096 bl stor 80-000-00 00:00 EPUB/ch04.xhtml
-rw---- 2.0 fat 304 bl stor 80-000-00 00:00 EPUB/co01.xhtml
-rw---- 2.0 fat 304 bl stor 80-000-00 00:00 EPUB/cover.xhtml
-rw---- 2.0 fat 820269 bl defN 80-000-00 00:00 EPUB/covers/9781449328030_lrg.jpg
-rw---- 2.0 fat 4976 bl stor 80-000-00 00:00 EPUB/css/epub.css
-rw---- 2.0 fat 112 bl stor 80-000-00 00:00 EPUB/css/synth.css
-rw---- 2.0 fat 206928 bl stor 80-000-00 00:00 EPUB/fonts/FreeSansBold.otf
-rw---- 2.0 fat 1284512 bl stor 80-000-00 00:00 EPUB/fonts/FreeSerif.otf
-rw---- 2.0 fat 109856 bl stor 80-000-00 00:00 EPUB/fonts/UbuntuMono-B.ttf
-rw---- 2.0 fat 129472 bl stor 80-000-00 00:00 EPUB/fonts/UbuntuMono-BI.ttf
-rw---- 2.0 fat 114208 bl stor 80-000-00 00:00 EPUB/fonts/UbuntuMono-R.ttf
-rw---- 2.0 fat 116528 bl stor 80-000-00 00:00 EPUB/fonts/UbuntuMono-RI.ttf
-rw---- 2.0 fat 302576 bl stor 80-000-00 00:00 EPUB/images/spi_global_ad.png
-rw---- 2.0 fat 1131440 bl stor 80-000-00 00:00 EPUB/images/web/epub3_0401.png
-rw---- 2.0 fat 1152 bl stor 80-000-00 00:00 EPUB/index.xhtml
-rw---- 2.0 fat 240 bl stor 80-000-00 00:00 EPUB/lexicon/en.pls
-rw---- 2.0 fat 224 bl stor 80-000-00 00:00 EPUB/lexicon/fr.pls
-rw---- 2.0 fat 4972 bl defN 80-000-00 00:00 EPUB/package.opf
-rw---- 2.0 fat 1888 bl stor 80-000-00 00:00 EPUB/pr01.xhtml
-rw---- 2.0 fat 880 bl stor 80-000-00 00:00 EPUB/pr01s02.xhtml
-rw---- 2.0 fat 896 bl stor 80-000-00 00:00 EPUB/pr01s03.xhtml
-rw---- 2.0 fat 880 bl stor 80-000-00 00:00 EPUB/pr01s04.xhtml
-rw---- 2.0 fat 736 bl stor 80-000-00 00:00 EPUB/pr01s05.xhtml
-rw---- 2.0 fat 288 bl stor 80-000-00 00:00 EPUB/spi-ad.xhtml
-rw---- 2.0 fat 269 bl defN 80-000-00 00:00 META-INF/container.xml
-rw---- 2.0 fat 32600 bl defN 80-000-00 00:00 META-INF/encryption.xml
39 files, 4351630 bytes uncompressed, 4100956 bytes compressed: 5.8%
The version of the prod frontend is not up-to-date. I have to get it updated asap.
Found by using zipinfo on e.g. Moby Dick.
In package.opf: <item id="cover-image" properties="cover-image" href="images/9780316000000.jpg" media-type="image/jpeg"/>
In zipinfo: OPS/images/9780316000000.jpg version of encoding software: 2.0 compression method: deflated compression sub-type (deflation): normal