johnfactotum / foliate-js

Render e-books in the browser
https://johnfactotum.github.io/foliate-js/reader.html
MIT License
412 stars 60 forks source link
azw azw3 cbz cfi dictd ebook ebook-reader epub epub-reader fb2 mobi opds reader stardict

foliate-js

Library for rendering e-books in the browser.

Features:

Demo

The repo includes a demo viewer that can be used to open local files. To use it, serve the files with a server, and navigate to reader.html. Or visit the online demo hosted on GitHub. Note that it is very incomplete at the moment, and lacks many basic features such as keyboard shortcuts.

Also note that deobfuscating fonts with the IDPF algorithm requires a SHA-1 function. By default it uses Web Crypto, which is only available in secure contexts. Without HTTPS, you will need to modify reader.js and pass your own SHA-1 implementation.

Current Status

It's far from complete or stable yet, though it should have near feature parity with Epub.js. There's no support for continuous scrolling, however.

Among other things, the fixed-layout renderer is notably unfinished at the moment.

Documentation

Overview

This project uses native ES modules. There's no build step, and you can import them directly.

There are mainly three kinds of modules:

The modules are designed to be modular. In general, they don't directly depend on each other. Instead they depend on certain interfaces, detailed below. The exception is view.js. It is the higher level renderer that strings most of the things together, and you can think of it as the main entry point of the library. See "Basic Usage" below.

The repo also includes a still higher level reader, though strictly speaking, reader.html (along with reader.js and its associated files in ui/) is not considered part of the library itself. It's akin to Epub.js Reader. You are expected to modify it or replace it with your own code.

Basic Usage

import './view.js'

const view = document.createElement('foliate-view')
document.body.append(view)

view.addEventListener('relocate', e => {
    console.log('location changed')
    console.log(e.detail)
})

// can open a File/Blob object or a URL
// or any object that implements the "book" interface
await view.open('example.epub')
await view.goTo(/* path, section index, or CFI */)

Security

Scripting is not supported, as it is currently impossible to do so securely due to the content being served from the same origin (using blob: URLs).

Furthermore, while the renderers do use the sandox attribute on iframes, it is useless, as it requires allow-scripts due to a WebKit bug: https://bugs.webkit.org/show_bug.cgi?id=218086.

It is therefore imperative that you use Content Security Policy (CSP) to block all scripts except 'self'. An EPUB file for testing can be found at https://github.com/johnfactotum/epub-test.

[!CAUTION] Do NOT use this library without CSP unless you completely trust the content you're rendering or can block scripts by other means.

The Main Interface for Books

Processors for each book format return an object that implements the following interface:

The following methods are consumed by progress.js, for getting the correct TOC and page list item when navigating:

Almost all of the properties and methods are optional. At minimum it needs .sections and the .load() method for the sections, as otherwise there won't be anything to render.

Archived Files

Reading Zip-based formats will require adapting an external library. Both epub.js and comic-book.js expect a loader object that implements the following interface:

In the demo, this is implemented using zip.js, which is highly recommended because it seems to be the only library that supports random access for File objects (as well as HTTP range requests).

One advantage of having such an interface is that one can easily use it for reading unarchived files as well. For example, the demo has a loader that allows you to open unpacked EPUBs as directories.

Mobipocket and Kindle Files

It can read both MOBI and KF8 (.azw3, and combo .mobi files) from a File (or Blob) object. For MOBI files, it decompresses all text at once and splits the raw markup into sections at every <mbp:pagebreak>, instead of outputting one long page for the whole book, which drastically improves rendering performance. For KF8 files, it tries to decompress as little text as possible when loading a section, but it can still be quite slow due to the slowness of the current HUFF/CDIC decompressor implementation. In all cases, images and other resources are not loaded until they are needed.

Note that KF8 files can contain fonts that are zlib-compressed. They need to be decompressed with an external library. The demo uses fflate to decompress them.

PDF and Other Fixed-Layout Formats

There is a proof-of-concept, highly experimental adapter for PDF.js, with which you can show PDFs using the same fixed-layout renderer for EPUBs.

CBZs are similarly handled like fixed-layout EPUBs.

The Renderers

It has two renderers, one for paginating reflowable books, and one for fixed-layout. They are custom elements (web components).

A renderer's interface is currently mainly:

It has the following custom events:

The paginator uses the same pagination strategy as Epub.js: it uses CSS multi-column. As such it shares much of the same limitations (it's slow, some CSS styles do not work as expected, and other bugs). There are a few differences:

To simplify things, it has a totally separate renderer for fixed layout books. As such there's no support for mixed layout books.

Both renderers have the part named filter, which you can apply CSS filters to, to e.g. invert colors or adjust brightness:

foliate-view::part(filter) {
    filter: invert(1) hue-rotate(180deg);
}

The filter only applies to the book itself, leaving overlaid elements such as highlights unaffected.

The Paginator

The layout can be configured by setting the following attributes:

(Note: there's no JS property API. You must use .setAttribute().)

It has built-in header and footer regions accessible via the .heads and .feet properties of the paginator instance. These can be used to display running heads and reading progress. They are only available in paginated mode, and there will be one element for each column. They are styleable with ::part(head) and ::part(foot). E.g., to add a border under the running heads,

foliate-view::part(head) {
    padding-bottom: 4px;
    border-bottom: 1px solid graytext;
}

EPUB CFI

Parsed CFIs are represented as a plain array or object. The basic type is called a "part", which is an object with the following structure: { index, id, offset, temporal, spatial, text, side }, corresponding to a step + offset in the CFI.

A collapsed, non-range CFI is represented as an array whose elements are arrays of parts, each corresponding to a full path. That is, /6/4!/4 is turned into

[
    [
        { "index": 6 },
        { "index": 4 }
    ],
    [
        { "index": 4 }
    ]
]

A range CFI is an object { parent, start, end }, each property being the same type as a collapsed CFI. For example, /6/4!/2,/2,/4 is represented as

{
    "parent": [
        [
            { "index": 6 },
            { "index": 4 }
        ],
        [
            { "index": 2 }
        ]
    ],
    "start": [
        [
            { "index": 2 }
        ]
    ],
    "end": [
        [
            { "index": 4 }
        ]
    ]
}

The parser uses a state machine rather than regex, and should handle assertions that contain escaped characters correctly (see tests for examples of this).

It has the ability ignore nodes, which is needed if you want to inject your own nodes into the document without affecting CFIs. To do this, you need to pass the optional filter function that works similarly to the filter function of TreeWalkers:

const filter = node => node.nodeType !== 1 ? NodeFilter.FILTER_ACCEPT
    : node.matches('.reject') ? NodeFilter.FILTER_REJECT
    : node.matches('.skip') ? NodeFilter.FILTER_SKIP
    : NodeFilter.FILTER_ACCEPT

CFI.toRange(doc, 'epubcfi(...)', filter)
CFI.fromRange(range, filter)

It can parse and stringify spatial and temporal offsets, as well as text location assertions and side bias, but there's no support for employing them when rendering yet.

Highlighting Text

There is a generic module for overlaying arbitrary SVG elements, overlayer.js. It can be used to implement highlighting text for annotations. It's the same technique used by marks-pane, used by Epub.js, but it's designed to be easily extensible. You can return any SVG element in the draw function, making it possible to add custom styles such as squiggly lines or even free hand drawings.

The overlay has no event listeners by default. It only provides a .hitTest(event) method, that can be used to do hit tests. Currently it does this with the client rects of Ranges, not the element returned by draw().

An overlayer object implements the following interface for the consumption of renderers:

The Text Walker

Not a particularly descriptive name, but essentially, text-walker.js is a small DOM utility that allows you to

  1. Gather all text nodes in a Range, Document or DocumentFragment into an array of strings.
  2. Perform splitting or matching on the strings.
  3. Get back the results of these string operations as Ranges.

E.g. you can join all the text nodes together, use Intl.Segmenter to segment the string into words, and get the results in DOM Ranges, so you can mark up those words in the original document.

In foliate-js, this is used for searching and TTS.

Searching

It provides a search module, which can in fact be used as a standalone module for searching across any array of strings. There's no limit on the number of strings a match is allowed to span. It's based on Intl.Collator and Intl.Segmenter, to support ignoring diacritics and matching whole words only. It's extrenely slow, and you'd probably want to load results incrementally.

Text-to-Speech (TTS)

The TTS module doesn't directly handle speech output. Rather, its methods return SSML documents (as strings), which you can then feed to your speech synthesizer.

The SSML attributes ssml:ph and ssml:alphabet are supported. There's no support for PLS and CSS Speech.

Offline Dictionaries

The dict.js module can be used to load dictd and StarDict dictionaries. Usage:

import { StarDict } from './dict.js'
import { inflate } from 'your inflate implementation'

const { ifo, dz, idx, syn } = { /* `File` (or `Blob`) objects */ }
const dict = new StarDict()
await dict.loadIfo(ifo)
await dict.loadDict(dz, inflate)
await dict.loadIdx(idx)
await dict.loadSyn(syn)

// look up words
const query = '...'
await dictionary.lookup(query)
await dictionary.synonyms(query)

Note that you must supply your own inflate function. Here is an example using fflate:

const inflate = data => new Promise(resolve => {
    const inflate = new fflate.Inflate()
    inflate.ondata = data => resolve(data)
    inflate.push(data)
})

OPDS

The opds.js module can be used to implement OPDS clients. It can convert OPDS 1.x documents to OPDS 2.0:

It exports the following symbols for properties unsupported by OPDS 2.0:

There are also two functions that can be used to implement search forms:

These two functions return an object that implements the following interface:

Generating Images for Quotes

With quote-image.js, one can generate shareable images for quotes:

document.querySelector('foliate-quoteimage').getBlob({
    title: 'The Time Machine',
    author: 'H. G. Wells',
    text: 'Can an instantaneous cube exist?',
})

Supported Browsers

The main use of the library is for use in Foliate, which uses WebKitGTK. As such it's the only engine that has been tested extensively. But it should also work in Chromium and Firefox.

Apart from the renderers, using the modules outside browsers is also possible. Most features depend on having the global objects Blob, TextDecoder, TextEncoder, DOMParser, XMLSerializer, and URL, and should work if you polyfill them. Note that epubcfi.js can be used as is in any environment if you only need to parse or sort CFIs.

License

MIT.

Vendored libraries for the demo: