jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.39k stars 3.37k forks source link

EPUB image formatting: Pixel format (px) is omitted in target XHTML #6862

Closed abrey66 closed 3 years ago

abrey66 commented 3 years ago

BUGREPORT:

pandoc.exe 2.11.1.1 Compiled with pandoc-types 1.22, texmath 0.12.0.3, skylighting 0.10.0.3, citeproc 0.1.1.1, ipynb 0.1

input:

    [question]: img/ad-question.svg {#xxid .xxclass width=32px}

xhtml-output:

<dt><img src="../media/file2.svg" id="xxid" class="xxclass" width="32" /> Was ist der Sinn des Lebens?</dt>

PROBLEM:

Since the width-dimension 32px is translated into 32 (i.e. without px) most epub-readers will include a dominating "width: auto" - THIS WILL EVEN OVERWRITE ANY CUSTOM CSS! Effect = Depending on your CSS the image will either 'disappear' or be scaled to a huge size. Affected Readers include: THE OFFICAL TOLINO webreader (the german commercial reader)

SUGGESTED SOLUTION:

Pass the 'px' attribute through, so that it still appears in the generated xhtml.

CURRENT WORKAROUND:

instead of using the width attribute include the height-attribute with mm. 3px are at about 1mm:

[question]: img/ad-question.svg {#xxid .xxclass height=10mm}

Thanks for consideration. Have a good day!

Andreas (German IT-guy)

mb21 commented 3 years ago

The HTML img's width attribute is always in pixel, so changing the output to width="32px" would be invalid.

abrey66 commented 3 years ago

Thank you mb21. I tried the actual pandoc version and included a suggested (clumsy) workaround. The problem remains. Since px is really omitted in the pandoc-generated EPUB-File I consider it a real bug.

P.S.: I disagree about your "invalid" statement. W3O-spec, Chrome-Inspector and the actual readers consider 'px' as a valid dimension property - here: mandatory, if you want the IMG to be displayed correctly.

tarleb commented 3 years ago

Could you link to the W3C spec? At least the mdn doc seems to agree with @mb21:

width: The intrinsic width of the image in pixels. Must be an integer without a unit.

abrey66 commented 3 years ago

I am impressed about the quick replies - thank you for that! Indeed the W3O-syntax defs are curious (and need always be compared to the real implementations). I always do the following:

  1. Looking into the syntax-Defs (with examples) in W3O. Example https://dev.w3.org/html5/pf-summary/the-xhtml-syntax.html in section 12.3.2. IMAGES.

  2. Using the W3C validator (here: https://validator.w3.org/#validate_by_input). You can easily paste example snippets inside and will see that the tool marks only 1 error: The missing ALT-attribute. Should pandoc include a dummy ALT-tag into each IMG ?

Well, anyhow. The real life problem remains and could be easily debugged. Do you agree? OR: just do as the user defines in the md-file : if there is a width=32} pass just the number if there is a width=32px} pass 32px.

tarleb commented 3 years ago

I don't have the necessary devices to reproduce the issue, but I'll take your word that this is a real problem. Pandoc tries hard to generate valid HTML only, and users might rely on the current behavior. Breaking anyone's workflows should be avoided. To be honest, I'm not sure I fully appreciate the problem yet. It appears that there's a bug in the eBook reader; my expectation would be that it will be fixed it there. But we can try to develop (and possibly include) a work-around, preferably in CSS.

mb21 commented 3 years ago

That's why I asked:

what ebook viewers have you tried and can you post screenshots of your problem?

It might very well be that your ebook viewer ignores the invalid width attribute. Have you tried without any width specified?

To reproduce this, we'd at least need a minimal input file and the relevant image.

abrey66 commented 3 years ago

Hi there, since I am a self-publisher in Germany I would like to stress the scope of the problem and market:

Tolino is the brand name for ALL German bookstores (i.e. that are not AMAZON) that offer ebook-services and readers (hardware). Hence we are discussing the only serious competitor of kindle in the entire European market!

Minimal input: I believe that is the line given by me above. ( you might include a ![][question] above in a dummy text )

Relevant image: just any dummy-image, either small or page-wide. I strongly recommend trying later a SVG, since you might be willing to improve the SVG-handling for the most important target platforms (there are more bugs concerning this 'modern' format like omission / distortion / viewport-viewbox issues!). My workaround: I use only GIF, JPG or PNG.

PLEASE NOTE - for reproducing the issue: You do not actually need to reproduce the faulty behaviour "on display" if you believe my test-findings above. The 'translation bug' of pandoc can be obviously verified by looking into the generated EPUB-file (the corresponding XHTML-file inside the EPUB/text folder.). There you will find the missing 'px' dimension.

However it might be a very good idea to include 'official webreaders' of the most important book-shops. Then ...

Target plattform / reader / web-reader: I use the Tolino Shine (2 or 3) as a hardware-reader;

You can reproduce the interpretation inside the webreader in firefox or chrome:

  1. The tolino-webreader help-page: https://mytolino.de/tolino-webreader-ebooks-online-lesen/

  2. Register in one of the three mentioned stores, like: Thalia.

  3. With the free dummy-account: Drag the epub-file into the open webreader; it will appear after a short upload-time.

  4. Click the book-cover in order to open it.

  5. Browse to any page, then invoke the inspector (CTRL-SHIFT-I).

Hope it helps.

tarleb commented 3 years ago

I think the problem here is that we have different views on what makes a "bug". We are not going to produce broken (i.e., invalid) output to accommodate a buggy reader implementation.

Best would be if you would help us by finding a CSS-based method to resolve the issue.

Maybe it's enough to add a style attribute setting the width there? You can do that with a small Lua filter:

function Image (img)
  if img.attributes.width then
    -- overwrites the element's "style" and sets the width using the original
    -- "width" attribute, i.e. including any units.
    img.attributes.style = string.format('width: %s;', img.attributes.width)
    return img
  end
end
tarleb commented 3 years ago

Closing this now. If there is a fix which doesn't require the creation of invalid HTML, please let us know.

abrey66 commented 3 years ago

Hello tarleb, thank you for trying to solve the issue. My technical conclusion is: the overall handling of SVG for epub-container is not well enough implemented by pandoc, that tries to follow a "clean", i.e. standard-conform, strategy.

Thx for the lua-snippet - looking into this language and pandoc-use seems fun - maybe I will learn that soon.

A philosophical remark on what "we both consider a bug":

The argument of your first sentence cannot hold - in no sense of the words "valid" and "buggy". I assume that you are defending the purest form of RFC-standard compatibility and despise everything that "is not inside the technical grammars". To explain my strong doubts, let me refer to real-life examples of my IT jobs since 1984:

mb21 commented 3 years ago

Not sure this is a philosophic disagreement (for which surely jgm would be the expert ;-)). The pragmatic argument in favour of tarleb's position is that your proposed change is very likely to break other ebook viewers, including future ones.

jgm commented 3 years ago

I agree: we need to produce a valid EPUB, and that means valid XHTML inside the EPUB container. I'm sorry your reader has a bug. Please report it to the maker of that reader! Meanwhile, you can use the workaround suggested. As @mb21, many other readers do rely on the EPUB being valid and might break with the change you suggest.