koreader / crengine

This is the KOReader CREngine fork. It cross-pollinates with the official CoolReader repository at https://github.com/buggins/coolreader, in case you were looking for that one.
70 stars 45 forks source link

html5.css & epub.css: better conformance to the HTML Standard suggested rendering #555

Closed poire-z closed 6 months ago

poire-z commented 6 months ago

Except for the few first commits, the following commits aimed at getting nearer to what https://html.spec.whatwg.org/multipage/rendering.html describes (except for the fancy HTML widgets we don't support like <dialog>, <meter>, <progress>...) This brings a lot of little changes, so possibly new bugs - to be bumped after next KOReader stable release, and tested for many weeks before the next one.

In-page footnotes: better handle duplicated ids

Follow-up to last commit of #554 dd5cf73 to better support books with id= values duplicated, which can happen in Wikipedia EPUBs. Ie.

<li id="link1">txt1, see <span id="foo">foo</span></li>
<li id="link2">txt2, see <span id="foo">foo</span></li>

would cause txt2 to be shown as in-page footnote for links targetting #link1. This is still a bit clumsy, our way to handle multiple ids is probably not the best, but it does not require changes elsewhere. This may be revisited in the future.

lvrend: handle in-page footnotes in table <caption>

In Wikipedia EPUBs, <caption> are used as container of the legend below an image, and may be quite verbose and contain link to footnotes.

LVString: ignore CJK chars in lStr_findWordBounds()

Otherwise, some words composed of both CJK chars and latin chars could be returned and fed to hyphenate() (this lStr_findWordBounds() is only used for this), which would properly states some hyphenation after a CJK char is allowed, resulting in spurious hyphens drawn after a CJK char. See https://github.com/koreader/koreader/issues/11478#issuecomment-1951424809. Should allow closing https://github.com/koreader/koreader/issues/11478.

lvtext: AddLine(): handle some CJK + space edge case

Avoid flexible CJK chars not getting flexed at end of line if followed by some ASCII space(s). See https://github.com/koreader/koreader/issues/11478#issuecomment-1951424809.

EPUB: look for EPUB3 cover even when EPUB2 cover advertized

With EPUB3 documents, prefer finding an EPUB3 cover, and only when none found fallback to using any specified EPUB2 cover. See around https://github.com/koreader/koreader/pull/11491#issuecomment-1963021880, after a long investigation and time wasted. (Another patch to exctract an image from the first DocFragment, that feels not urgently needed, is available at https://github.com/koreader/koreader/pull/11491#issuecomment-1962947041 if we ever need it.)

List items: proper per-specs positionning and sizing

The sweet historical crengine way of rendering list items was to have each list-item carry its marker in its own border box (so that they do not overlap with anything on the left), but it is not per-CSS-specs: markers should be drawn outside the list item border box (in its margin, or in its parent padding, and over any element on the left or in the page margin, possibly truncated, if these margin and padding do not provide sufficient room for the marker).

We can still get something similar to that historical sweet behaviour by using, on the list-item parent container (normally an OL or UL): padding-left: -cr-special, which will compute to the widest list items marker width. We can then use in our user-agent stylsheet:

  ol, ul { padding-left: -cr-special; }

instead of the absurd per-specs:

  ol, ul { padding-left: 40px; }

(an absolute length, which will look different depending on the font size or the DPI), that any publisher caring about list item rendering would be obliged to override; if they haven't, this -cr-special will do better than 40px. See https://github.com/koreader/crengine/issues/521#issuecomment-1952341201. Closes #521.

epub.css: add/use "@media (-cr-max-cre-dom-version: 20180527)"

Add this private @media keyword to replace the more verbose -cr-ignore-if-dom-version-greater-or-equal: 20180528.

fb2def.h: add more HTML element and attributes names

As they could be mentionned in stylesheet, having them known with a fixed id could avoid some stylesheet hash mismatches and spurious renderings.

CSS: generic support for handling presentational hints

via -cr-hint: presentational-hint Initially discussed in https://github.com/koreader/crengine/issues/551#issuecomment-1890975524. Closes #551.

CSS: add support for private -cr-apply-func:

Will help ensuring bits from the HTML specs that can't fully be expressed with CSS. To be used in html5.css, updated in a later commit. Many rendering issues with our lack of support for these noticed in https://github.com/koreader/koreader/issues/10770.

ldomDocumentWriterFilter: remove attribute to CSS conversion

If was only done with HTML documents (not with EPUBs) and, by translating them to a style= attribute, was giving them a too high specificity, while they should have a very low one so they are overridden by any author CSS. Referencing old issues where I thought about doing that: https://github.com/koreader/crengine/issues/480#issuecomment-1134357893 https://github.com/buggins/coolreader/pull/214#issuecomment-755952995

lvrend: more proper rendering of block images

When such images have borders, the border was taking the full page width... This commit makes the border sticks to the image (+ its padding if any). Also make sure there is no strut, causing some blank below the image running until the border. Make the existing addition of an image title/suptitle/subtitle attributes values as text only with FB2/FB3 documents, as nowhere in the HTML/EPUB specs it is said this should be done. See https://github.com/koreader/koreader/issues/11173#issuecomment-1950277100.

lvrend: keep margin_left/right updated when auto

When handling margin: auto, or avoiding any width overflow, we need to keep margin_left and _right updated with their used values, as they are still used later, in particular when rendering erm_final to compute the floats footprint. (Not doing that and keeping them at 0 could have the floats footprint "eat" more into the width available for rendering text, making it taller than needed.)

CSS: add support for handling HTML's align= attribute

Will allow us to use:

  center { text-align: -cr-html-align-center }
  div[align=right] { text-align: -cr-html-align-right }

to properly handle this (deprecated) attribute that should still be supported by browsers according to the current HTML specs: https://html.spec.whatwg.org/multipage/rendering.html#align-descendants We implement this the exact same way as Firefox and Webkit (with text-align: private keywords), even if it feels like not the way to implement how it is described in the HTML specs...

lvrend: fix HR and images positionning when floats involved

Rework 524e502d (where we did it for <HR>, possibly not the right way). Very tedious to understand how this works in browsers, tried different things until I luckily got the same rendering with various combinations of floats, margin fixed or auto, and HTML align=... but it's quite possible this will fail in other contexts or combinations... Note that Firefox and Edge do handle some cases differently, so this is quite gray territory...

epub.css, html5.css: minor updates for easier stylesheet switch

Make epub.css and html5.css be similar with all display: and white-space:, to avoid the "Styles have changed in such a way that fully reloading the document may be needed for a correct rendering" popup when switching between them. Some values are updated to be what they will be in the major update to html5.css coming in next commit.

epub.css, html5.css: major updates for better conformance

Thanks to all the previous commits, we can now get html5.css to ensure most of the HTML Standard suggested rendering. Have epub.css extend html5.css (so, it benefits from all the fixes), and update only what needs to be different for a more book like look. See https://github.com/koreader/koreader/issues/11494#issuecomment-1963045594 Should allow closing https://github.com/koreader/koreader/issues/11494. Should allow closing https://github.com/koreader/koreader/issues/10770.


This change is Reviewable

poire-z commented 6 months ago

AFTER :)

This brings a lot of little changes, so possibly new bugs - to be bumped after next KOReader stable release, and tested for many weeks before the next one.

I just found some typo in my new html5.css :/ If you have some tools to check a stylesheet for errors (that would not fail to bad on my -cr-xyz added stuff, it would be nice to feed them to it to see if there are others.

poire-z commented 6 months ago

I realized I have <a> not styled at all html5.css (old and new). I had and have:

/* crengine-: not supported
:link { color: #0000EE; }
:visited { color: #551A8B; }
:link:active, :visited:active { color: #FF0000; }
:link, :visited { text-decoration: underline; cursor: pointer; }
:focus-visible { outline: auto; }
*/

But I guess I should just replace this with: a { color: #0000EE; text-decoration: underline; cursor: pointer; } Shouldn't I ?

In epub.css, we had and have:

/* Make links noticable */
a {
  text-decoration: underline;
  color: gray;
}

I don't really like having this gray color for links (see screenshots in https://github.com/koreader/koreader/issues/11503, where links themselves and the footnote number (a back link) are in light gray, hardly graspable to me. Although it's probably more distinguishable than blue on our eInk. But do we need this distinction? Should we use a darker gray ?

Frenzie commented 6 months ago

If you have some tools to check a stylesheet for errors (that would not fail to bad on my -cr-xyz added stuff, it would be nice to feed them to it to see if there are others.

I'll check if stylelint can be tamed sufficiently to not be overly in the way.

poire-z commented 6 months ago

Of course, it wouldn't detect the typo I had (missing comma after h1) as it is still some valid selector:

:is(article, aside, nav, section) :is(article, aside, nav, section) :is(article, aside, nav, section) :is(article, aside, nav, section) h1
h5 {
Frenzie commented 6 months ago

Of course, it wouldn't detect the typo I had (missing comma after h1) as it is still some valid selector:

Heh, I forgot to ask which typo you were talking about.

On the plus side :laughing:, thanks to all that effort I did fix the redundant double margin-top and margin-bottom in fb2.css.

I'm not sure if no-descending-specificity should remain enabled as a warning; most of it comes across as noise. I would however prefer to fix the length-zero-no-unit warnings and turn them into errors.

poire-z commented 6 months ago

Feel (you or @hius07) welcome to cleanup that ugly looking fb2.css - no style change needed, but it could be made more readable, with proper formatting, some reordering maybe, and additional comments if needed.

Frenzie commented 6 months ago

If you run stylelint with --fix it'll automatically apply fixes btw. At the very least for the length-zero-no-unit rule that'd work very well.

poire-z commented 6 months ago

Self-ish note: squash last commit into previous before merging.

Frenzie commented 6 months ago

The reorder is a performance optimization?

poire-z commented 6 months ago

No, no performance change, in any way. It's just for when we use "Show matched rules": image

Showing the CSS snippets matching the rare listing will show (we just parse and cut into the CSS stylesheet strings the bits that do match) the listing, and the selectors that follow, which are a bit of noise, in yellow - but it is ok to get them on rare elements: image

While for a p, it will show the same snippet, but will cut it just before the p {: image

Without the reorder, with the previous: blockquote, figure, listing, p, plaintext, pre, xmp { on any P we look at, we would see the noisier: image

Frenzie commented 6 months ago

Not showing the full selector seems potentially a bit confusing to me; I wasn't aware of that behavior. I mean in case I wanted to edit it or something.

poire-z commented 6 months ago

We show the full selector (singular) that matches, plus any others (that don't match) on its right (if I could easily hide them, I would). We don't show the other selectors on the left (that did not match). What you'll want to edit/tweak/override after using this feature is this single full selector. Combining these selectors with commas is just to avoid duplicating the declaration, but you're usually interested only in one. (Unless you really want to see all the selectors that got color:yellow to fix them before you actually see their effect on the book you're reading, but this must be super rare :)) One can see the full stylesheet with all selectors with the View xyz.css at the bottom of the View HTML widget.

Frenzie commented 6 months ago

Combining these selectors with commas is just to avoid duplicating the declaration, but you're usually interested only in one. (Unless you really want to see all the selectors that got color:yellow to fix them before you actually see their effect on the book you're reading, but this must be super rare :))

In the context of EPUBs specifically (as opposed to HTML+CSS in general) it is super rare for me in the sense that I seldom override anything (give or take style tweaks like classic classname footnotes), but when I do I normally do indeed want to override all of them. In your example too, if I don't like yellow here then why would I like it there.

But there are small screen restrictions at play too… :-)

poire-z commented 6 months ago

I think I'll do some reorg in html5.css. I wanted to stay as faithful and litteral with https://html.spec.whatwg.org/multipage/rendering.html and keep the snippets as they are written, but it itches me to see the following. It shouldnt cost that much to check 3 or 5 selectors, but still... and it takes room in "View matched rules". I wanted to stay litteral for easier pickups of future changes, but that page is a mess and it will be hard to follow changes - and I guess it's not something that should ever change (except for added new fancy html elements).

3 plain selector+declarations for ol and ul: https://github.com/koreader/crengine/blob/54ec2811366ed8d5f79a050db281f7a5d72d2d5f/cr3gui/data/html5.css#L430-L482 5 plain selector+declarations for th ! https://github.com/koreader/crengine/blob/54ec2811366ed8d5f79a050db281f7a5d72d2d5f/cr3gui/data/html5.css#L586-L636

Frenzie commented 6 months ago

5 plain selector+declarations for th !

Right, that's why I said the Stylelint output mostly seemed like noise, and that's the kind of stuff it's actually trying to complain about. I guess in large teams you might need such overly strict slightly misguided rules.

poire-z commented 6 months ago

No more no-descending-specificity with using :where (0 added specificity) instead of :is (specificity of the most specific inner item). It looks to me this is alright, and that it's now just the ordering that will still make them match before others. What do you think? (we don't really have to fight these warnings, but if it's painless without side effects, why not.)

I realize I have to add these ugly

:where(article, aside, nav, section) :where(article, aside, nav, section) :where(article, aside, nav, section) h1,
h4 {

also in epub.css, whether I use :is() or :where(), otherwise:

Frenzie commented 6 months ago

If you prefer it that way, why not. :-)

poire-z commented 6 months ago

Regarding, from your suggestion at https://github.com/koreader/crengine/pull/300#issuecomment-511154956:

/* crengine+: make a new page on H1 and section, but not in EPUBs (publishers may not
 * expect it; most EPUB renderers only break page on a new DocFragment). */
section, h1 {
  -cr-only-if: -epub-document; /* only if NOT EPUB document */
    page-break-before: always;
}
section > h1:first-of-type, section section {
  -cr-only-if: -epub-document; /* only if NOT EPUB document */
    page-break-before: auto;
}

Should we really keep it? In html5.css (the standard rendering, which doesn't specify anything about page breaks), or only in epub.css (our book-alike stylesheet). If wouldn't have any effect on EPUBs as stated, but may on HTML or other documents translated to HTML like DocX or OTF Also, it feels bogus, it would break on the 2nd++ H1 inside a <section><section><section> where they would act like a H4. I'd rather remove them than scratch my head to find out something better, that may still do unwelcome things. (We have styletweaks to break on headings, but those to prevent them are clumsy, and actually restores the ones from epub.css...)

edit: I guess that if we want to keep it, it could just be:

section, h1 {
  -cr-only-if: -epub-document; /* only if NOT EPUB document */
    page-break-before: always;
}
section h1, section section {
  -cr-only-if: -epub-document; /* only if NOT EPUB document */
    page-break-before: auto;
}
Frenzie commented 6 months ago

I'm not opposed to removing it altogether, but I think your comment really just applies to the :first-of-type — as you edited in while I was away for a few minutes.

poire-z commented 6 months ago

Not edited, it's the first snippet above - I just added another snippet at the bottom. (And what "your comment really just applies" :/ I put more than one comment there :))

Frenzie commented 6 months ago

You already linked to my opinion — I think it would generally improve display to treat top-level section as a page-break and not for anything lower. That's what's shown in your second edited snippet, which I believe corresponds to what I originally suggested. The first strikes me as needlessly complex with the first-of-type.

But as stated I'm not opposed to removing them if you'd rather have them gone.

Frenzie commented 6 months ago

If you've fixed the no unit after 0 thing, you can remove this line:

https://github.com/koreader/crengine/blob/f0ba598e1773bd41bdc7450fa7f0357b212cdb23/.stylelintrc.json#L7

poire-z commented 6 months ago

(Added 2 commits, at the top: In-page footnotes: better handle duplicated ids lvrend: handle in-page footnotes in table <caption> )