koreader / crengine

This is the KOReader CREngine fork. It cross-pollinates with the official CoolReader repository at https://github.com/buggins/coolreader, in case you were looking for that one.
70 stars 45 forks source link

Fix CSS so the large images in FB2 files don't overflow to second page #562

Closed dmalinovsky closed 5 months ago

dmalinovsky commented 5 months ago

This PR fixes 2 issues.

  1. First one when the image is downsized correctly but keeps attached to the header due to page-break-after: avoid. I'm not sure why this attribute was set originally — I traveled through Git history 16 years back and it was still there.
Rendering with page-break-after: avoid Rendering with page-break-after: auto
  1. The first fix doesn't help when there's no header before the image (or maybe the image was too large, I'm not sure). In this case, setting max-height: 99vh did the trick. When I tried setting max-height: 100vh, it had no effect.
Before After

Fixes https://github.com/koreader/koreader/issues/9292.


This change is Reviewable

poire-z commented 5 months ago
  1. First one when the image is downsized correctly but keeps attached to the header due to page-break-after: avoid. I'm not sure why this attribute was set originally

For me, it makes sense when there is no image: you don't want a page break before a title and its first paragraph. I understand you may want to allow a page break there it there is an image after the title, but you should not kill the most common first case. We also have it with HTML/EPUB: https://github.com/koreader/crengine/blob/55429c37868322408df15cfb0d26bc839b90cccf/cr3gui/data/html5.css#L358-L395 So, it feels this may need more thinking.

The first fix doesn't help when there's no header before the image (or maybe the image was too large, I'm not sure). In this case, setting max-height: 99vh did the trick. When I tried setting max-height: 100vh, it had no effect.

So, that first fix is probably not the best :) if you need to accumulate the tricks. I'll give it some thoughts. (The 99vh vs 100vh might be because, per HTML rendering rules, an image sits on the baseline, and there is some little space below the baseline (for legs of characters like p g q f). May be it needs some tricks in the code.)

Btw, you can also notice the image is not centered, cf https://github.com/koreader/koreader/issues/11623. I'm investigating, so no need to spend time on it. May be we'll find out some CSS bits that you can stick in this PR. Follow us there.

dmalinovsky commented 5 months ago

For me, it makes sense when there is no image: you don't want a page break before a title and its first paragraph.

But we have a page break before the title, isn't it enough? I tried adding page break back, and the image overflow is back as well.

I'm investigating, so no need to spend time on it. May be we'll find out some CSS bits that you can stick in this PR. Follow us there.

Thanks, it's perfect timing! Should I close this PR then?

Also, if you need a test file, here's my 2 books: Gogol_Myortvye-dushi.JoTalQ.756625.fb2.zip and Dante_BVL-Seriya-pervaya_28_Novaya-zhizn-Bozhestvennaya-komediya.RA4baQ.265160.fb2.zip. The files are pretty large, ~5 MB each. The large images are at the beginning of the book.

poire-z commented 5 months ago

But we have a page break before the title, isn't it enough?

It might be enough, but "logically/semantically", you'd want to express it as I wrote it: we don't want a page break between a title and its following first paragraph. Moreover, we only have this break before: always with H1 and H2, not with H3, H4...: https://github.com/koreader/crengine/blob/55429c37868322408df15cfb0d26bc839b90cccf/cr3gui/data/fb2.css#L72-L75 and you'd want to have it works the same with H2 and H3.

Logically, we could think about another way than "don't break after a title/heading": "don't break before a paragraph (or non-image element) following a title/heading". That is: removing the break-after: avoid for title,h1,h2, and using stuff like:

is(title, h1, h2, h3, h4, h5, h6) + p,
is(title, h1, h2, h3, h4, h5, h6) + li {
    break-before: avoid
}

Or a more expensive:

is(title, h1, h2, h3, h4, h5, h6) + not(image) {
    break-before: avoid
}

(I haven't checked these snippets.)

Not sure what kind of elements can follow a heading, it feels to me FB2 can have a mess of various containers like <section> that could contain paragraph. @Frenzie may have more experience and ideas than me with such sibling selectors

I'm investigating, so no need to spend time on it. May be we'll find out some CSS bits that you can stick in this PR. Follow us there.

Thanks, it's perfect timing! Should I close this PR then?

I was just talking about the image centering :) Experimentations with this page-break:avoid & image issue are still in your hands :)

Frenzie commented 5 months ago

For me, it makes sense when there is no image: you don't want a page break before a title and its first paragraph. I understand you may want to allow a page break there it there is an image after the title, but you should not kill the most common first case.

Same, I would set avoid there too.

@Frenzie may have more experience and ideas than me with such sibling selectors

I disagree with the premise that you'd want a page break before an image in the first place, sorry. :-)

(Of course I can see why you'd want it in some edge cases like very tall images but we're talking generically. But also avoid isn't the opposite of always so the edge case is already taken care of probably?)

poire-z commented 5 months ago

The issue is that HEADING height +(break avoid)+ IMAGE height may be larger than page, and crengine can't really know is would need to resize the image to some max height of (page_height - heading_height). It only has some code to ensure the image height is page height - margin/border/padding of the image itself. So, we need to have a fallback with CSS and the break-before/after:auto/always/avoid so the page splitting code can, when it sees a image with height=page_height, to be sure it's set on a new page all by itself, which is better than truncating the image.

But wait before thinking more, I want to see if getting rid of the strut: image is enough for the page splitting code to ignore the break:avoid and push a now-not-larger-than-page-height image to a new page.

dmalinovsky commented 5 months ago

I was just talking about the image centering :)

Oh, I didn't even notice that the images weren't centered. It wasn't a problem I was trying to solve. :)

But also avoid isn't the opposite of always so the edge case is already taken care of probably?)

@Frenzie, what do you mean? My edge case doesn't work with avoid.

But wait before thinking more

Sure, I have my own CSS changed, so I can wait as long as needed. :)

poire-z commented 5 months ago

OK, so if I tweak for FB2 (as I do for HTML) the strut to be 0:

--- a/crengine/src/lvrend.cpp
+++ b/crengine/src/lvrend.cpp
@@ -3932,6 +3932,13 @@ void renderFinalBlock( ldomNode * enode, LFormattedText * txform, RenderRectAcce
                     for ( int i=0; i<lines.length(); i++ )
                         txform->AddSourceLine( lines[i].c_str(), lines[i].length(), cl, bgcl, font.get(), lang_cfg, flags|LTEXT_FLAG_OWNTEXT, line_h, valign_dy, 0, enode );
                 }
+    if ( rm == erm_final ) { // (should probably always be the case)
+        // If standalone block image, do as done above for floating images
+        // (Firefox does not ensure the strut and neither text-indent.)
+        txform->setStrut(0, 0);
+        line_h = 0;
+        indent = 0;
+    }
                 txform->AddSourceObject(flags, LTEXT_OBJECT_IS_IMAGE, line_h, valign_dy, indent, enode, lang_cfg );
                 title = enode->getAttributeValue(attr_subtitle);
                 if ( !title.empty() ) {

image

https://github.com/koreader/crengine/blob/55429c37868322408df15cfb0d26bc839b90cccf/crengine/src/lvrend.cpp#L3916-L3964 The "problem" is that crappy FB2 can have text injected above and below an image from its title/subtitle/suptitle attributes... So, I guess it would need some checks before doing that, only if no title/sub/suptitle... and let images with title/sub/suptitle be crappy and truncated (truncrapped ? :)) If going with that (with a bit more thinking that my test attempt), no need for any of the CSS tweaks of this PR.

Frenzie commented 5 months ago

@dmalinovsky

@Frenzie, what do you mean? My edge case doesn't work with avoid.

If the problem is that avoid doesn't behave entirely as desired that's a bit different than avoid being the wrong thing to use. Avoiding avoid (grin) is certainly a possible solution but it doesn't feel like breaking off if whatever follows is actually too long (i.e., presumably the desired behavior) should be overly difficult. With the caveat that I haven't actually looked at the code yet. ;-)

dmalinovsky commented 5 months ago

So, I guess it would need some checks before doing that, only if no title/sub/suptitle... and let images with title/sub/suptitle be crappy and truncated (truncrapped ? :))

I think we can ignore it and let subtitle move to the next page.

This looks good. Can you please test with another file I've uploaded? It has different FB2 structure, so your fix may not work there.

That is: removing the break-after: avoid for title,h1,h2, and using stuff like:

I can try that if it's not too expensive to handle CSS like this one. What do think?

poire-z commented 5 months ago

Looks ok I think (with some possible spurious blank page caused probably by some empty line or a <br/> or some empty paragraph following an image:

image

image

image

That is: removing the break-after: avoid for title,h1,h2, and using stuff like:

I can try that if it's not too expensive to handle CSS like this one. What do think?

We no longer need to think about CSS if the code can now properly size an image so it does not overflow a page, and so can be brought naturally (by some other bit of code involved in page splitting) to a new page, with a higher priority than any existing break:avoid.

dmalinovsky commented 5 months ago

That's perfect, @poire-z! Let me know when I can close this PR then.

poire-z commented 5 months ago

Oh, shit, I was testing with still your title, h1, h2 { page-break-after: auto instead of avoid }. Your 2nd book is still ok, but the Dante picture is still cut... I'll need to look at the page splitting code :/

poire-z commented 5 months ago

Well, I'm not sure I want to do anything about the Dante picture. We're in the situation of that first comment - and we go to the else // slice this line: https://github.com/koreader/crengine/blob/55429c37868322408df15cfb0d26bc839b90cccf/crengine/src/lvpagesplitter.cpp#L1010-L1051 The "line" being the image in our case - but it could be a table row. I think it's fine to ensure the break:avoid, with the side effect of cutting the image, in this rare case - rather than pushing lots of stuff to new page and having tons of blank space. This can also happens with EPUB and other break:avoid situations, and the solution in that case to read the "line" untruncated (ie. tall table rows) is often to switch to "scroll/continuous" mode - or for an image to just long-press on it and view it fit to screen in ImageViewer... Or you could may be curate a user style tweak that solves this issue (or may be shipped if it's generic and doesn't look too ugly or side-effects-full).

I would still put my setStrut(0, 0); stuff so all the things in that second book (where there is no break:avoid festival I think) would look ok.

dmalinovsky commented 5 months ago

What about replacing my change with the following CSS?

title + image,  title + section.image{
  page-break-after: allow;
}

Will it be acceptable?

dmalinovsky commented 5 months ago

Because the corresponding FB2 code looks like this:

<section>
   <title>
    <p>Gogol'</p>
    <p>Dead souls</p>
   </title>
   <section>
    <empty-line/>
    <image l:href="#pict_001.png"/>
    <empty-line/>
   </section>

Or like this:

<title>
    <p>Dante</p>
   </title>
   <image l:href="#i_001.jpg"/>
poire-z commented 5 months ago

What about replace my change with the following CSS?

title + image,  title + section.image{
  page-break-after: allow;
}

Will it be acceptable?

Feels like air-css :) allow is not a supported keywoard (only auto, always, avoid are). And this would affect break after the image, not between a title and and image. And section.image matches <section class=image>, which is not what you want.

Because the corresponding FB2 code looks like this:

Yes, in these cases you met. But our fb2.css should be generic and logical for the FB2 specs, and not something ad-hoc for some books you met (style tweaks are for that). The FB2 specs are messy and allow various combinations of elements, I'm not really familiar with how they are used in the wild, so I'd rather not hack fb2.css everytime you meet a book where you need something different :/ and revert it back when you met another book where the change will cause issues.

dmalinovsky commented 5 months ago

Hmm, title + image selector didn't work.

dmalinovsky commented 5 months ago

But our fb2.css should be generic and logical for the FB2 specs, and not something ad-hoc for some books you met (style tweaks are for that).

Fair enough. I'm glad that you fixed the 2nd problem, thanks for debugging! I'll add my custom CSS.

Frenzie commented 5 months ago

What about replacing my change with the following CSS?

That doesn't look right for the XML in question. Something like title + section > image might work (provided you use auto).

dmalinovsky commented 5 months ago

That doesn't look right for the XML in question. Something like title + section > image might work (provided you use auto).

Thanks, my CSS became too rusty.

dmalinovsky commented 5 months ago

How didn't I see this wonderful feature, Style Tweaks! Now I can hide the notes section in FB2 file:

body[name="notes"] {
  display: none;
}

Perfect! Thanks for https://github.com/koreader/koreader/pull/3944, @poire-z!

dmalinovsky commented 5 months ago

Oops, it also hid the notes themselves, huh. Oh well.

Frenzie commented 5 months ago

I think -cr-hint: non-linear might be for that.

dmalinovsky commented 5 months ago

Thanks, @Frenzie, it worked together with "Hide non-linear fragments" setting.

poire-z commented 5 months ago

It seems it's already there, no? https://github.com/koreader/crengine/blob/55429c37868322408df15cfb0d26bc839b90cccf/cr3gui/data/fb2.css#L115-L121 So you should always have that menu [ ] Hide non-linear fragments when there are 2 body, ready to be enabled to hide the notes & comments :)

dmalinovsky commented 5 months ago

Wow, you're absolutely right, and I didn't try that setting. :/

Frenzie commented 5 months ago

Oops, haha. :+1: