hauke96 / wiki2book

A simple CLI tool to create good-looking eBooks from Wikipedia articles.
GNU General Public License v3.0
14 stars 1 forks source link
ebook ebooks epub wikipedia

wiki2book

wiki2book is a tool to create good-looking eBooks from one or more Wikipedia articles.

The goal is to create eBooks (EPUB files) as beautiful as real books from a couple of Wikipedia articles. Therefore, wiki2book is specifically implemented to create such books by implementing awareness for Wikipedia- and website-specific features (more on that below). This should make reading Wikipedia articles even more fun and may create a whole new readership for this awesome and imperceptibly large database of knowledge.

eBook of the German article about astronomy on a Tolino eBook-reader.

Why not simply using pandoc?

Good question.

Pandoc and others like wb2pdf or percollate as well) are great and yes, they can convert mediawiki to EPUB. In fact, wiki2book relies on pandoc to turn HTML into EPUB because pandoc is well known and it's a simple program call.

However, there are always things missing in these tools, for example rendering math, downloading images, evaluating templates or a proper handling of tables. They also don't do any eBook-specific assumptions, e.g. ignoring ebook-unsuitable styles or not evaluating Wikipedia-oriented templates.

Most existing tools are furthermore rather general purpose, which is not beneficial for the very specific task of converting Wikipedia articles to beautiful offline eBooks.

Another feature missing in all of these tools: You cannot turn multiple articles into a ready-to-read eBook. But wiki2book has exactly this functionality called "projects" as described below.

Installation

Usage

Currently only a CLI (command line interface) version of wiki2book exists, so nothing with a GUI. Wiki2book need a configuration file (s. the configs folder), currently only a German config file exists.

Preliminaries

You need the following tools and fonts:

  1. ImageMagick (to have the convert command)
  2. Optional:
    • Pandoc (when using the pandoc output driver). See notes on pandoc versions 2 and 3 below.
    • DejaVu fonts in /usr/share/fonts/TTF/DejaVuSans*.ttf (is used by the default style in this repo but can be replaced to any other font).

CLI

The CLI contains three sub-commands that generate an EPUB file from different sources (s. below for examples and details on each sub-command):

  1. Project: wiki2book project ./path/to/project.json
  2. Article: wiki2book article "article name"
  3. Standalone: wiki2book standalone ./path/to/file.mediawiki

Use wiki2book -h for more information and wiki2book <command> -h for information on a specific command.

Configuration

See the config documentation.

Pandoc version 2 and 3

Only relevant when using the pandoc output driver.

Pandoc version 2 might internally use CSS3 parameters by default, such as the gap property. This might cause problems on certain eBook readers (e.g. Tolino ones). To overcome this, pass the argument --pandoc-data-dir ./pandoc/data to wiki2book, which uses a template from this repo without such problematic gap parameter.

Alternatively install pandoc 3, which avoids CSS3 parameters.

Examples

In the following there are working example calls to wiki2book.

The necessary parameters used below (see ./wiki2book -h for more information):

Project

Use the following command to build the German project about astronomy:

./wiki2book project -c configs/de.json ./projects/de/astronomie/astronomie.json

Single article

Render a single article by using the article sub-command:

./wiki2book article -c configs/de.json -s projects/style.css "Erde"

Standalone

Use the following command to render the file

./wiki2book standalone -c configs/de.json -s projects/style.css ./integration-tests/test-real-article-Erde.mediawiki

Contribute

Issues, bugs, ideas

Feel free to open a new issue. But keep in mind: This is a hobby-project and my time is limited. Things with less or no use for me personally will get a lower priority.

Development

For building, running, testing, etc. take a look at src/README.md.

Long-term goals