Welcome to the Bookmaker toolchain! Bookmaker comprises a series of scripts that turn a Word document into an HTML document, and then into a PDF and/or EPUB file.
Each script in the Bookmaker sequence performs a distinct set of actions that builds on the scripts that came before, and depends on any number of other scripts or tools. While most of these scripts were originally written for internal use at Macmillan, we've done our best to hone them down to a cross-platform, generic core that can be used out of the box (though there are still a number of dependencies, discussed further down). The scripts all live here, in the core directory.
It's important to note that correct transformation depends on correct application of the Macmillan Word template, a set of styles and rules for Microsoft Word manuscripts that create the initial structure each manuscript needs in order to cleanly transform into valid HTMLBook HTML. You can learn more about styling and the Word template here.
The scripts are as follows:
config.rb: This is where you configure your system set-up, for example, the location of your cloned core scripts, location of the external dependencies, etc.
header: This is the core Bookmaker library, that contains paths and references common to all the Bookmaker scripts.
tmparchive: Creates the temporary working directory for the file to be converted, and opens an alert to the user telling them the tool is in use.
Dependencies: Pre-determined folder structure
htmlmaker: Converts the .xml file to HTML using wordtohtml.xsl.
Dependencies: tmparchive, Python 2.7.x, correct application of the Macmillan Word template, Java JDK, Saxon, wordtohtml.xsl
filearchive: Creates the directory structure for the converted filesbookmaker_coverchecker: Verifies that a cover image has been submitted. If yes, copies the cover image file into the final archive. If no, creates an error file notifying the user that the cover is missing.
Dependencies: tmparchive, htmlmaker
imagechecker: Checks to see if any images are referenced in the HTML file, and if those image files exist in the submission folder. If images are present, copies them to the final archive; if missing, creates an error file noting which image files are missing.
Dependencies: tmparchive, htmlmaker, filearchive
coverchecker: Checks to see if a front cover image file exists in the submission folder. If the cover image is present, copies it to the final archive; if missing, creates an error file noting that the cover is missing.
Dependencies: tmparchive, htmlmaker, filearchive
stylesheets: Copies EPUB and PDF css into the final archive, while also counting how many chapters are in the book and adjusting the CSS to suppress chapter numbers if only one chapter is found.
Dependencies: tmparchive, htmlmaker, filearchive
pdfmaker: Preps the HTML file and sends to the DocRaptor service for conversion to PDF.
Dependencies: tmparchive, htmlmaker, filearchive, imagechecker, coverchecker, chapterheads, SSL cert file, DocRaptor cloud service, doc_raptor ruby gem
epubmaker: Preps the HTML file and converts to EPUB using the HTMLBook scripts.
Dependencies: tmparchive, htmlmaker, filearchive, imagechecker, coverchecker, chapterheads, Saxon, HTMLBook, python
cleanup: Removes all temporary working files and working dirs.
*Dependencies: tmparchive, htmlmaker, filearchive, imagechecker, coverchecker, stylesheets
Bookmaker requires a few pieces of metadata to accompany each project, which you can provide in a JSON file. Here's a sample:
config.json
{
"title":"Alice in Wonderland",
"author":"Lewis Carroll",
"productid":"99237561",
"printid":"9781234567890",
"ebookid":"9781234567899",
"imprint":"Project Gutenberg",
"publisher":"Project Gutenberg",
"printcss":"/Users/nellie/Documents/css/pdf.css",
"printjs":"/Users/nellie/Documents/js/pdf.js",
"ebookcss":"/Users/nellie/Documents/css/epub.css",
"frontcover":"cover.jpg"
}
Each of the following fields is used for various purposes throughout the Bookmaker toolchain:
By default, Bookmaker will look for all files (images, config.json) in the same folder as the input file, and create the output folders there as well. However, you can specify a custom submission folder and done folder in config.rb.
Additionally, the following directory structures are required:
Paths for all of the above four folders must be configured in config.rb. See the installation instructions below for details.
The Bookmaker scripts depend on various other utilities, as follows:
Install Bookmaker by following these steps, in order.
On your server, create the following folders and subfolders.
If you haven't yet set up a GitHub account, do that now (you can just set up a basic, free account).
Now install git on your server, following the standard instructions.
The source code for the Bookmaker scripts is hosted in the Macmillan GitHub account, broken down into several repositories. The production-ready versions of each script live in the master branch in each repository. The repositories are as follows:
If you plan to make changes to the source code, you will want to fork those repositories and then clone them, so that you can maintain your version of the code.
Install the utilities listed in the previous section, as needed. For reference, you need to install the following in order to create these outputs:
Bookmaker requires Ruby 1.9.x. Follow standard installation instructions for your operating system.
Once Ruby is installed, you'll need to install a few gems:
gem install open-uri
gem install json
gem install fileutils
gem install doc_raptor
gem install net-sftp
gem install htmlentities
gem install unidecoder
gem install to_xml
gem install ruby-oci8
gem install bundler
Bookmaker requires Python version 2.7. Windows users must install python in the specified Resources directory (see "Create the Folder Structure" above).
For Mac, download and install python from here.
For Windows, follow the directions here.
Saxon is an XSLT processor that runs the script to convert the Word document to HTML, and also transforms the HTML to create the EPUB file. Right now Bookmaker can only run with Saxon, but we'd love to add support for other XSLT2.0 processors.
Download Prince (http://www.princexml.com/download/) and follow the instructions to install.
To configure Bookmaker to use Prince, open config.rb and edit the following fields:
$pdf_processor = "prince"
If you choose to use DocRaptor to create PDFs, you'll need to set up a DocRaptor account and give Bookmaker your authentication credentials.
To set up a DocRaptor account, go to docraptor.com, and follow the instructions to create an account. You'll need to know your API key to use Bookmaker; you can find your API key at the top right of your Dashboard.
You also need to install the DocRaptor ruby gem. In terminal or command prompt, type:
$ gem install doc_raptor
To configure Bookmaker to use DocRaptor, open config.rb and edit the following fields:
$pdf_processor = "docraptor"
...
$docraptor_key = "YOUR_API_KEY_HERE"
Note that Docraptor requires all images that you want to include in the text to be hosted somewhere online, so you'll need to make sure your image src's in your Word or HTML file point to this online location. You can store these images behind a basic http auth barrier--you'll just need to provide the auth credentials in config.rb by editing the following fields:
$http_username = "YOUR_USERNAME_HERE"
$http_password = "YOUR_PASSWORD_HERE"
The EPUB generation script relies on a collection of open source scripts called HTMLBook (created by O'Reilly), which are hosted on GitHub here: https://github.com/oreillymedia/HTMLBook.
The entire contents of the repository should be cloned or copied to your server, at the same level as the other Bookmaker scripts.
If you are using Saxon PE as your EPUB XSL processor, you'll need to edit the following files:
htmlbook.xsl
Line 22: Comment out this line. It refers to the exsl package, which is not supported by our conversion software.
<!--<xsl:include href="https://github.com/macmillanpublishers/bookmaker/blob/master/functions-exsl.xsl"/>--> <!-- Functions that are compatible with exsl package -->
Line 24: Uncomment this line--our conversion software (Saxon) uses xslt2, so we need to activate these functions.
<xsl:include href="https://github.com/macmillanpublishers/bookmaker/blob/master/functions-xslt2.xsl"/> <!-- Functions that are compatible with XSLT 2.0 processors -->
Within the primary Bookmaker repository (which is to say, this repository), you can configure your system paths to point to the correct folder locations for the folders you created in the steps above. Open config.rb and edit the following values:
The full path of the Temp folder:
$tmp_dir = "YOUR_PATH_HERE"
The full path of the Log folder:
$log_dir = "YOUR_PATH_HERE"
The full path of the main parent folder where all your scripts (including this repository) live:
$scripts_dir = "YOUR_PATH_HERE"
The full path of the Resource folder:
$resource_dir = "YOUR_PATH_HERE"
If you didn't already do this earlier, choose either prince or docraptor to create your PDFs:
$pdf_processor = "docraptor" #(or "prince")
You can run bookmaker by firing the scripts one-by-one on the command line, or by combining them into a bash or batch file to fire all at once. You can see examples of Macmillan's .bat files here: https://github.com/macmillanpublishers/bookmaker_deploy/. A simple deployment script for Mac might look like this (this script would take the input filename as the command line argument):
#! /bin/sh
ruby /Users/nellie.mckesson/bookmaker/bookmaker/core/tmparchive/tmparchive.rb $1
ruby /Users/nellie.mckesson/bookmaker/bookmaker/core/htmlmaker/htmlmaker.rb $1
ruby /Users/nellie.mckesson/bookmaker/bookmaker/core/filearchive/filearchive.rb $1
ruby /Users/nellie.mckesson/bookmaker/bookmaker/core/imagechecker/imagechecker.rb $1
ruby /Users/nellie.mckesson/bookmaker/bookmaker/core/coverchecker/coverchecker.rb $1
ruby /Users/nellie.mckesson/bookmaker/bookmaker/core/stylesheets/stylesheets.rb $1
ruby /Users/nellie.mckesson/bookmaker/bookmaker/core/pdfmaker/pdfmaker.rb $1
ruby /Users/nellie.mckesson/bookmaker/bookmaker/core/epubmaker/epubmaker.rb $1
ruby /Users/nellie.mckesson/bookmaker/bookmaker/core/cleanup/cleanup.rb $1
To convert a project, drop the input text file along with any assets (interior images, etc.) into your conversion folder. Project metadata is read from a config.json file that should be submitted along with your book assets.
Bookmaker uses CSS to layout both the print (PDF) and ebook files. The CSS should be supplied with the project and included in the project metadata (see Project Metadata above).
Bookmaker can support local CSS imports. Imported CSS files will be placed at the top of the compiled CSS file. Imports must be local files (i.e., Bookmaker can not yet support web resources), and must be structured as follows:
@import "path/to/file.css";
or
@import 'path/to/file.css';
You may also include oneoff CSS files, for example if you're working with templated CSS but need to change just a few design elements for a specific book. To use oneoff CSS, include a file called either "oneoff_pdf.css" or "oneoff_epub.css" (as appropriate) in your assets folder alongside any other assets (e.g., book cover, interior images, project metadata, etc.). Bookmaker will apply this CSS to the appropriate format, and archive the oneoff CSS file in your final archive folder. Additionally, if you already created a oneoff CSS file for a previous conversion of the same book, Bookmaker will pick up that CSS file automatically from the final archive folder (no need to resubmit it).
Print layout is based on the new CSS 3 Paged Media spec. To learn how to write CSS for paged media, checkout out these resources:
For each run of the Bookmaker toolchain, every method in any ruby script writes to a json logfile. This provides a clean, formatted log for troubleshooting, and allows scripts later in the bookmaker toolchain to read (and act upon) output from scripts that have already run.
The json log file is created in your $logdir (as set in your config.rb) and named input_filename.json. It is overwritten on subsequent runs of the same input file.
All methods are set to log a value of ‘true’ on success, or log the exception string in case of error. Additional key values and useful output are logged as well. All methods and logged output from a given script are nested under that script’s name, and logged in order of runtime. The start and completion times for each script are also included.
Because of it's modular architecture, users can insert extensions to the Bookmaker toolchain to customize their content conversions. For example, Macmillan has a number of custom content conversions that they insert before and after various pieces of the Bookmaker toolchain. You can peruse these extensions here. Extensions are added as intermediary steps during deployment; see Macmillan's deployment scripts for examples.