jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.53k stars 3.37k forks source link

when converting from markdown to EPUB, `svgz` images get re-named to `svg`, but they are not (decompressed to) `svg`s #5163

Open thomasWeise opened 5 years ago

thomasWeise commented 5 years ago

First of all, many thanks for this wonderful tool! I am really loving pandoc!

I am trying to convert markdown to EPUB. Now I am not an expert in EPUB, so I am not sure whether this is really an error or expected behavior, but: When I convert markdown including an svgz image to EPUB, the image is copied into the media folder but its extension is changed from svgz to svg. svgz is simply a gz-compressed version of svg, however during the EPUB generation, the svgz is not uncompressed. This throws off the EPUB readers that I have tested as well as the IDPF validator, which simply says something like Unable to read file 'media/file0.svg'.. The suggested fix would be to check an image for the svgz extension and if it has it, decompress it first instead of just copying it.

In the archive example.zip attached to this issue, you can find:

Again, many thanks for your great tool!

example.zip

mb21 commented 5 years ago

I'm not saying this is good behaviour by pandoc. But probably, you shouldn't be using svgz in EPUBs anyway. Not even all readers support svg, and svgz isn't event listed as a core media type...

thomasWeise commented 5 years ago

This is true, of course.

Just a small edit for clarification: I am not having an EPUB document with svgz inside, I have markdown+svgz which I want convert to EPUB.

But you can seemingly use svgz with markdown and I use pandoc for document conversion. You can also use svgz with markdown and convert to pdf via pdflatex, which is what I am doing as well - so this works (and I do not think svgz is supported by pdflatex).

I think that maybe the EPUB conversion could simply decompress the svgz to svg. Right now, it seems to copy the svgz files directly. So it does recognize the svgz file extension, I think? If it does so, then all what is necessary is to unpack svgz, because then you have svg. Unfortunately, I don't know how to write Haskell code nor do I understand the project code, but I would hope that this could be a fix of low complexity, maybe one more "IF" and one function call to some gzip function?

Of course, this is just a hope. As said, I am not sure how complex this is to implement or whether this even makes sense.

link2xt commented 5 years ago

I just changed pandoc to decompress ".svgz" just like it does with ".gz". Previously it changed the MIME type but did not decompress. Now it works with standalone HTML.

With ".epub", both ".gz" and ".svgz" are still renamed, but not decompressed.

link2xt commented 5 years ago

Also, svgz with wrong MIME type works in Firefox, but I don't think it is standard.

There is an issue in W3C, but I don't think it was resolved: https://www.w3.org/Graphics/SVG/WG/track/issues/2313 At least I can't find any references to gzip in SVG specification

thomasWeise commented 5 years ago

OK, great! ...but I am also a bit confused.

If I understand correctly, this means that:

Or - if I misunderstood - is the general use of svgz files in conjunction with markdown wrong/not supported?

link2xt commented 5 years ago

@thomasWeise

  • for markdown+svgz-> HTML, this issue also existed and has been fixed.

  • for markdown+svgz -> EPUB, it has not (yet) been fixed but might be possible to fix?

This is correct.

thomasWeise commented 5 years ago

Excellent. Thank you ^_^

jgm commented 5 years ago

Related issue: #2183. See the note for the (commented out) svgz in Text.Pandoc.MIME.

jgm commented 5 years ago

The most robust solution, as noted in #2183, would be "to modify openURL and fetchItem so they return a content encoding, as well as a mime type." We may be able to get by with something more minimal for now, but we should fix this somehow.