simonhaenisch / md-to-pdf

Hackable CLI tool for converting Markdown files to PDF using Node.js and headless Chrome.
https://www.npmjs.com/md-to-pdf
MIT License
1.16k stars 110 forks source link

feature: HTML Transformation #122

Open maciek-ibm opened 2 years ago

maciek-ibm commented 2 years ago

Problem:

Sometimes there is a need to transform HTML just before generating the output in md-to-pdf, i.e. we need to convert images to Base64, sanitize links, add PagedJS, add some JavaScript.

Solution:

We achieved it by modifying a part of code in md-to-pdf like this:

let html = getHtml(md, config);
    if (config.transform_html) {
        html = await config.transform_html(html);
    }

Where transform_html is a custom function, i.e.:

const { JSDOM } = require('jsdom');
    const { window } = new JSDOM('<!DOCTYPE html>');
    if (!config.req) throw new TypeError(`config.req not defined!`);
    // NOTE: embeds images as base64
    const embedImages = new EmbedHTMLImages({ req: config.req });
    const sanitizedLinksHtml = replaceTextContentWithHref(
      html,
      window.DOMParser,
      true
    );

return await embedImages.run(sanitizedLinksHtml);