readium / readium-sdk

A C++ ePub renderer SDK
BSD 3-Clause "New" or "Revised" License
385 stars 164 forks source link

Fix page breaks in reflowable Epub books #229

Open olivierkorner opened 8 years ago

olivierkorner commented 8 years ago

CSS properties page-break-inside/after/before with values always/auto/avoid are sometimes used in Epub books to force or prevent text to the next page. This is a print-only CSS property and as such it's rendered in the webview.

By adding a CSS preprocessor content filter in the SDK, all occurences of page-break-* in CSS stylesheets and style elements and attributes in HTML content are replaced by matching column-break-*, enforcing or preventing the page breaks accordingly in the reader.

readium/readium-shared-js#127

page-breaks-demo

danielweck commented 8 years ago

That's a great use of ContentFilterChain, thanks @olivierkorner!

Do you think that the replacement of page-break-* with column-break-* will work in every conceivable native webviews? (right now: Android UIWebView and ChromiumView, iOS + OSX UIWebView and WKWebView, no Windows-specific MSEdgeIEView so far)

https://github.com/readium/readium-sdk/pull/229/files#diff-639376f74408a17718d6668606828a45R29

static REGEX_NS::regex CSSMatcher("page\\-break\\-(after|before|inside) *\\: *(always|avoid|left|right)", regexFlags);
...
static const std::string PageBreakReplacement = "-webkit-column-break-$1: $2; column-break-$&: $2";
olivierkorner commented 8 years ago

@danielweck I haven't been able to find up-to-date information on the support of column breaks across webviews. It'll work with webkit-based webviews, but I can't say for sure for the rest.

danielweck commented 7 years ago

Another comment: this renames page-break CSS statements to column-break in a indiscriminate manner, that is to say regardless of the actual rendering context within which the HTML document is rendered (e.g. scroll vs. column-paginated view modes).

Also, this only works with standalone stylesheet files, not inline CSS styles.

olivierkorner commented 7 years ago

@danielweck It works with both standalone stylesheet files and inline CSS styles: it replaces page-break statements in CSS files and in <style> element and style attributes in HTML files.

You're right though, it does replace regardless of the context. As far as I know, it's not possible to pass a context from the reader to the filter, is it?

I agree that having column breaks in a non-columnized view is not the most rigorous, but it's worth checking if it has any consequences on the rendering. I just tested on iOS: in scroll view mode, the column-break statements are just ignored.

I will do with more testing.

danielweck commented 7 years ago

Thank you for the correction (stylesheet files AND inline CSS). And yes, Content Filters handle raw data, they hhave no information about the document rendering context.

olivierkorner commented 7 years ago

I tested on several platforms: works on iOS, OS X, Android and it would work with MSEdgeIEView as it supports webkit prefixed statements for -webkit-column-break-*. I also checked if the substitution had any effect in scroll mode: on all these platforms, the column break statements are ignored.

rkwright commented 7 years ago

Thanks for creating and testing this @olivierkorner

@danielweck I am OK with it, If you are OK with this, please go ahead and merge it.

danielweck commented 7 years ago

Note: this PR incorrectly targets the master branch (should be develop).

danielweck commented 7 years ago

I added an inline code comment.

Also, for ePub3/ePub/initialization.cpp to compile with the new CSSPreprocessor C++ class in Android (or any target platform other than iOS/OSX readium-sdk/plaform/Apple/) we need to ensure that CSSPreprocessor.cpp is included in the build. If I am not mistaken, the Gradle build utilizes "wildcard" includes: https://github.com/readium/readium-sdk/blob/develop/Platform/Android/epub3/Stable.mk#L231 $(wildcard $(EPUB3_PATH)/ePub/*.cpp) and https://github.com/readium/readium-sdk/blob/develop/Platform/Android/epub3/build_experimental.gradle#L8

android.sources {
        main {
            jni {
                source {
                    srcDirs = [
                            '../../../ePub3/ePub'

...which means Android should compile without any modifications in the build configuration. @olivierkorner is that your experience as well?

danielweck commented 7 years ago

when merged, this will have to be documented here: https://github.com/readium/readium-shared-js/wiki/ContentModifications

olivierkorner commented 7 years ago

@danielweck I compiled a clean SDKLauncher-Android: no need to modify the build configuration. However, when I merged with the develop branch, I had a compile error in iOS and OS X. Maybe it should be fixed before I push my modifications.

danielweck commented 7 years ago

I think the iOS / OSX compile error is because of one of my previous Android fixes. It's a simple code patch (misplaced preprocessor directive) in ObfuscatedFonts ContentFilter, related to OpenSSL.

danielweck commented 7 years ago

Hello, can this PR be "updated" to target the develop branch? Also, sync https://github.com/ArtBookMagazine/readium-sdk/tree/feature/page_breaks with the current tip of the develop branch, to see if everything compiles / runs fine. Thanks!

olivierkorner commented 7 years ago

@danielweck So much easier now that GitHub allows to change the base branch!

danielweck commented 7 years ago

Yes, that's great! :)

JayPanoz commented 6 years ago

Sorry for being so late to the party, but I can give further details about that.

Intro

To put it simply, this has evolved significantly and is now part of the CSS Fragmentation Module. UA are even required to alias page-break-* properties to the new ones (break-*) for compatibility.

There’s an intent to implement from Chromium back to 2016.

I thought Webkit aliased as well but I’m apparently wrong.

Anyway, what’s important is that the spec doesn’t alias the super old -webkit-column-break-* properties, which is something I reported to the CSS multicol spec editors.

Which brings me to the iOS UIWebView setPagination API. Apparently, this has been an issue to Apple as well, because there is a paginationBreakingMode available, and it allows app implementers to choose between page-break-* and -webkit-column-break-*.

And, to quote Quirksmode:

This is an area of severe incompatibilities and immense amounts of keywords.

Which is consistent with my personal experience.

Bugs

Now to the fun parts…

Issue 1: at some point in time on iOS/MacOS, if a stylesheet had both, none would be taken into account – that’s a bug I reported myself ± 18 months ago.

Issue 2. fragmentation is not necessarily updated in some contexts (e.g. columns on html), and, on a reflow, for instance, Apple has to add <hr style="page-break-before: always"/> to make sure an element with page-break-inside: avoid works as expected (so they insert it before the element when the user changes the font-size, or change the size of the window on desktop for instance). I can confirm I had to force recalcs in ReadiumCSS because of nasty fragmentation update issues.

Issue 3. break-inside has super nasty bugs if the element is too long to fit into one column, e.g. vertical alignment can suddenly become completely off, and the element is laid out 40% from the top of the column. That might screw pagination as well, because you’ve got this extra 40% which is not visible in JS (offset, clientRect, etc.). Or lines of text being cut off.

Support

You won’t find those properties in CanIUse break feature, but you’ll find the mention to webkit-column-break-* support in CSS Multicol feature.

The biggest pain point is avoid is not supported for before and after, obviously.

But to sum things up, this probably should be fixed in browsers because bugs can arise pretty quickly, and from personal experience, actually supporting it can do more harm than good. I’ve been very very cautious when using those properties when authoring EPUB files because of the amount of rendering issues some can trigger.

[edit] There’s also a weird mention in the iBooks guidelines:

if you include page breaks to mark a chapter break, use page-break-after to create a break at the end of a chapter, not page-break-before to insert the break at the beginning of the chapter. This modification improves performance with the table of contents.

It won’t be unreasonable to think Webkit has issues with that, considering the CSS multicol implementation is quite terrible in all browsers.