jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.03k stars 3.35k forks source link

base64-urls don't work in epubs when the html file is in a subdirectory #3150

Closed lep closed 7 years ago

lep commented 7 years ago
$ pandoc --version
pandoc 1.17.1
Compiled with texmath 0.8.6.4, highlighting-kate 0.6.2.1.
Syntax highlighting is supported for the following languages:
    abc, actionscript, ada, agda, apache, asn1, asp, awk, bash, bibtex, boo, c,
    changelog, clojure, cmake, coffee, coldfusion, commonlisp, cpp, cs, css,
    curry, d, diff, djangotemplate, dockerfile, dot, doxygen, doxygenlua, dtd,
    eiffel, elixir, email, erlang, fasm, fortran, fsharp, gcc, glsl,
    gnuassembler, go, hamlet, haskell, haxe, html, idris, ini, isocpp, java,
    javadoc, javascript, json, jsp, julia, kotlin, latex, lex, lilypond,
    literatecurry, literatehaskell, llvm, lua, m4, makefile, mandoc, markdown,
    mathematica, matlab, maxima, mediawiki, metafont, mips, modelines, modula2,
    modula3, monobasic, nasm, noweb, objectivec, objectivecpp, ocaml, octave,
    opencl, pascal, perl, php, pike, postscript, prolog, pure, python, r,
    relaxng, relaxngcompact, rest, rhtml, roff, ruby, rust, scala, scheme, sci,
    sed, sgml, sql, sqlmysql, sqlpostgresql, tcl, tcsh, texinfo, verilog, vhdl,
    xml, xorg, xslt, xul, yacc, yaml, zsh
Default user data directory: /home/user/.pandoc
Copyright (C) 2006-2016 John MacFarlane
Web:  http://pandoc.org
This is free software; see the source for copying conditions.
There is no warranty, not even for merchantability or fitness
for a particular purpose.

I try to convert some epubs to pdfs and some fail with the following error:

$ pandoc -o failing.pdf failing.epub
pandoc: Could not find image `Text/data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAC0lEQVQIW2NkAAIAAAoAAggA9GkAAAAASUVORK5CYII=', skipping...
pandoc: Unable to convert `Text/data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAC0lEQVQIW2NkAAIAAAoAAggA9GkAAAAASUVORK5CYII=' for use with pdflatex.
! LaTeX Error: File `Text/data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAA
BCAYAAAAfFcSJAAAAC0lEQVQIW2NkAAIAAAoAAggA9GkAAAAASUVORK5CYII=' not found.

See the LaTeX manual or LaTeX Companion for explanation.
Type  H <return>  for immediate help.
 ...                                              

l.73 ...EQVQIW2NkAAIAAAoAAggA9GkAAAAASUVORK5CYII=}

pandoc: Error producing PDF

The issue is that if the epubs content is stored inside a directory like Text/ch0001.xhtml pandoc will, when it encounters some base64-inlined image, prefix that URI with Text/ which obviously doesn't exist.

I have attached two files as a zip, a working and a failing epub-file epubs.zip

lep commented 7 years ago

Ok, i have managed to compile pandoc from source:

$ git show-ref HEAD --abbrev=6
d8600d
$ ./dist/build/pandoc/pandoc --version
1.17.3
<snip>
$ ./dist/build/pandoc/pandoc -o failing.pdf failing.epub
pandoc: Could not find image `Text/data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAC0lEQVQIW2NkAAIAAAoAAggA9GkAAAAASUVORK5CYII=', skipping...
pandoc: Could not find image `Text/data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAIAAAACDbGyAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAcdvqGQAAAARSURBVBhXY3gro4KMGCjkAwCr9R1miWOjyQAAAABJRU5ErkJggg==', skipping...

As you can see the pdf builds now but the images are being ignored. So i guess that's okay-ish but if someone wants to investigate it's still the same testcase as above.

lep commented 7 years ago

This patch seems to do the trick for me but i have not compiled pandoc with tests enabled.

diff --git a/src/Text/Pandoc/Readers/EPUB.hs b/src/Text/Pandoc/Readers/EPUB.hs
index e547b84..ecbfa0b 100644
--- a/src/Text/Pandoc/Readers/EPUB.hs
+++ b/src/Text/Pandoc/Readers/EPUB.hs
@@ -109,7 +109,9 @@ iq _ = []

 -- Remove relative paths
 renameImages :: FilePath -> Inline -> Inline
-renameImages root (Image attr a (url, b)) = Image attr a (collapseFilePath (root </> url), b)
+renameImages root img@(Image attr a (url, b))
+  | "data:image" `isPrefixOf` url = img
+  | otherwise                     = Image attr a (collapseFilePath (root </> url), b)
 renameImages _ x = x

 imageToPandoc :: FilePath -> Pandoc