Converts the JSON representation of a Google Docs document into an HTML abstract syntax tree (HAST) which can be serialized to HTML or converted to Markdown.
Note: This library does not intend to match the rendering by Google Docs.
Install using NPM or similar.
npm i @googleworkspace/google-docs-hast
import { toHast } from "@googleworkspace/google-docs-hast";
// Retrieve document from API, https://developers.google.com/docs/api
const doc = ...;
// Convert the document to an HTML AST.
const tree = toHast(doc);
To get the serialized representation of the HTML AST, use the rehype-stringify package.
import { unified } from "unified";
import rehypeStringify from "rehype-stringify";
// Convert the document to an HTML string.
const html = unified()
.use(rehypeStringify, { collapseEmptyAttributes: true })
.stringify(tree);
All <img>
elements should be post-processed as the src
attribute is only valid for a short time and is of the pattern https://lh6.googleusercontent.com/...
.
import { visit } from "unist-util-visit";
visit(tree, (node) => {
if (node.type === "element" && node.tagName === "img") {
const { src } = node.properties;
// download, store, and replace the src attribute
node.properties.src = newSrc;
}
});
Named styles are converted to an HTML element matching the following table.
Named Style | HTML |
---|---|
Title | <h1 class="title"></h1> |
Subtitle | <p class="subtitle"></p> |
Heading 1 | <h1 class="heading-1"></h1> |
Heading 2 | <h2 class="heading-2"></h2> |
Heading 3 | <h3 class="heading-3"></h3> |
Heading 4 | <h4 class="heading-4"></h4> |
Heading 5 | <h5 class="heading-5"></h5> |
Heading 6 | <h6 class="heading-6"></h6> |
Normal Text | <p class="normal-text"></p> |
Text styles are converted to an HTML element: <i>
, <strong>
, <s>
, <sub>
, <sup>
, and <u>
.
If there is no direct mapping, a <span>
with CSS is used to support features such as text color and font. This can be disabled with { styles: false }
.
Header IDs are in the form id="h.wn8l66qm9m7y"
when exported from the Google Docs API. By default, header tag IDs are updated to match their text content. See github-slugger for more information on how this is done.
For example, the following html:
<h1 class="heading-1" id="h.wn8l66qm9m7y">A heading</h1>
becomes:
<h1 class="heading-1" id="a-heading">A heading</h1>
This can be disabled with { prettyHeaderIds: false}
.
const tree = toHast(doc, { prettyHeaderIds: false });
Some features of Google Docs are not currently supported by this library. This list is not exhaustive.
Type | Supported | Bug |
---|---|---|
Styles applied to embedded objects including borders, rotations, transparency | ❌ | |
documentStyle including pageSize, margins, etc |
❌ | |
namedStyles ( only added as class name on the appropriate tag ) |
❌ | |
Page numbers | ❌ | |
Page breaks | ❌ | |
Equations | ❌ | |
Columns | ❌ | |
Suggestions | ❌ | |
Bookmarks | ❌ |
Note: This library does not intend to match the rendering by Google Docs.
This is not an official Google product.