Closed phiresky closed 1 year ago
I added some additional error reporting and now get
Couldn't extract ePub file: getWordsTilSig: signature not found before EOF
The problem lies in extracting the zip container, and this message is generated by jgm/zip-archive, which evidently doesn't think this is a valid zip. (unzip
has no trouble unpacking it, however.)
From zip-archive code:
skip (fromIntegral extraFieldLength) -- extra field
compressedData <- if bitflag .&. 0O10 == 0
then getLazyByteString (fromIntegral compressedSize)
else -- If bit 3 of general purpose bit flag is set,
-- then we need to read until we get to the
-- data descriptor record. We assume that the
-- record has signature 0x08074b50; this is not required
-- by the specification but is common.
do raw <- getWordsTilSig 0x08074b50
I wonder if this is a case where the signature is different?
How was this epub produced, do you happen to know?
I unpacked it using zip unzip -d sd Software\ Design etc.
and then repacked it cd sd; zip -r ../sd.epub *
. pandoc was then able to handle the repacked sd.epub.
I'll transfer this to zip-archive. This code was added in https://github.com/jgm/zip-archive/pull/29/commits/4d66754458755e3279932608256d3a0d66830021
@mistmist if you're still out there, perhaps you could take a look?
Grepping shows that we don't have the signature "P K 07 08" in this file.
The documentation says
4.3.9.3 Although not originally assigned a signature, the value 0x08074b50 has commonly been adopted as a signature value for the data descriptor record. Implementers SHOULD be aware that ZIP files MAY be encountered with or without this signature marking data descriptors and SHOULD account for either case when reading ZIP files to ensure compatibility.
So I guess this is just a case where we don't have a signature. So I assume the data description is just the last 12 bytes before the start of the next local file (or something else e.g. sig 0x08064b50 or 0x02014b50).
sorry for the inconvenience, thanks @jgm for fixing this! (i was hoping to look into it this weekend)
Explain the problem.
pandoc cannot read this epub file. It doesn't give any more information than that one line. I can extract the same epub without issues as a zip file as well as with
ebook-convert
from Calibre.Pandoc version? pandoc 3.1, Arch Linux
Input epub file: Software Design X-Rays Fix Technical Debt with Behavioral Code Analysis by Adam Tornhill.zip