mixmark-io / turndown

🛏 An HTML to Markdown converter written in JavaScript
https://mixmark-io.github.io/turndown
MIT License
8.88k stars 879 forks source link

Support Deno? #390

Open NfNitLoop opened 3 years ago

NfNitLoop commented 3 years ago

Turndown looks like a great library. I'd love to use it from Deno, but I'm running into some issues at the moment.

I'm using Skypack to grab a module-based build of Turndown:

import tds from "https://cdn.skypack.dev/turndown@v7.1.1/lib/turndown.umd.js"

but due to https://github.com/fgnass/domino/issues/153, the domino dependency can't be built by Skypack.

I'm not sure what the fix is for that. But if there's a way to make turndown work with Deno, that'd be awesome. 😄

NfNitLoop commented 3 years ago

Skimming info about domino, I see:

In contrast to the original dom.js project, domino was not designed to run untrusted code.

Hmm, I definitely want to pass untrusted HTML to turndown (and in turn domino). Is that a security issue in turndown?

martincizek commented 3 years ago

Don't know anything about Deno, but this would be solved by "Bring Your Own Parser for Node.js environments (as suggested in #97 and #265).

For the current version, you might want to suppress the domino dependency and parse HTML to DOM on your own (if your environment does not provide native DOM parser like browsers do). So you might want to

import tds from "https://cdn.skypack.dev/turndown@v7.1.1/lib/turndown-browser.umd.js"

Then convert your HTML string to HTML DOM with any parser that works for you and pass the DOM to TurndownService#turndown() (instead of the string).

In contrast to the original dom.js project, domino was not designed to run untrusted code.

Hmm, I definitely want to pass untrusted HTML to turndown (and in turn domino). Is that a security issue in turndown?

This is unrelated to the issue.

martincizek commented 3 years ago

@NfNitLoop Please let me know if that works. :)

NfNitLoop commented 3 years ago

Whew. That works with some very bleeding edge code. 😆


import tds from "https://cdn.skypack.dev/turndown@7.1.1"

// JSDOM imports .json files, which doesn't seem to work in deno/skypack:
// import * as jsdom from "https://cdn.skypack.dev/jsdom@16.6.0"

// This one doesn't have a cloneNode() implementation yet:
// import { DOMParser, Element } from "https://deno.land/x/deno_dom/deno-dom-wasm.ts";

// So import directly from GitHub: 😬
import {DOMParser } from "https://github.com/b-fuze/deno-dom/raw/188d7240e5371caf1b4add8bb7183933d142337e/deno-dom-wasm.ts"

const parser = new DOMParser()
const service = new tds({})
function example(html: string) {
    const doc = parser.parseFromString(html, "text/html")
    if (!doc) { throw `failed to parse doc`}
    let result = service.turndown(doc)

    console.log("input:", html)
    console.log("output:", result)
}

example(`
    <a href="https://www.google.com">Link One</a> <a href="https://www.google.com">Link Two</a>
`)
rbeesley commented 3 years ago

This is what's working for me:

import TurndownService from 'https://cdn.skypack.dev/turndown@7.1.1';

import * as turndownPluginGfm from 'https://cdn.skypack.dev/@guyplusplus/turndown-plugin-gfm@1.0.7';
// deno-lint-ignore no-explicit-any
const { gfm } = turndownPluginGfm as any;

import jsdom from "https://jspm.dev/jsdom@16.6.0";
// deno-lint-ignore no-explicit-any
const { JSDOM } = jsdom as any;

// Create a global window.document so that there is a virtual DOM for Turndown
declare global {
  // deno-lint-ignore no-explicit-any
  var document: any;
  interface Window {
    // deno-lint-ignore no-explicit-any
    document: any;
  }
}
window.document = new JSDOM().window.document;

const turndownService = new TurndownService({
  hr: '---',
  codeBlockStyle: 'fenced'
});
turndownService.use(gfm);

...

let result = service.turndown(html)

Trying to use import jsdom from 'https://cdn.skypack.dev/jsdom@16.6.0'; I'm getting an error "Expected ';', '}' or at https://cdn.skypack.dev/-/domexception@v2.0.1-kxjDS6kD0nfMOQ7rhR8X/dist=es2020,mode=imports/unoptimized/lib/legacy-error-codes.json:2:18", which is actually why I stumbled upon this issue. I was trying to consolidate so that I wasn't using two different CDNs, but I'm seeing the same JSDOM problem with skypack that you are.

I'm not really getting any type definitions this way, everything seems to be "any" that I'm importing, but aside from that this is working without raw GitHub imports.

Edit: Okay, I think I've cleaned everything up to convey this correctly. By declaring the global I've created a place for Turndown to work with. This might be easier than the other recommendation in this discussion.

@martincizek, it might also be a good idea to bring some of this into Touchdown. If window.document is undefined, provide the boilerplate for someone to inject a WHATWG DOM and hook up those dependencies on the behalf of the user... something to make it more closely match the Node.js experience and documentation.

martincizek commented 3 years ago

I guess the default setup still just works for more Node.js users. Abandoning jsdom in favour of domino had a good reason.

But feel free to contribute. :)