mixmark-io / turndown

🛏 An HTML to Markdown converter written in JavaScript
https://mixmark-io.github.io/turndown
MIT License
8.78k stars 877 forks source link

Allow overriding default convertors #164

Closed johnboxall closed 8 years ago

johnboxall commented 8 years ago

I'm converting Google Documents via the Drive API's export to HTML function to Markdown.

Saying the HTML produced by this process leaves something to be desired is generous.

This export process redirects all links through Google's URL redirector, prepending https://www.google.com/url?q=$URL.

I'd love to use to-markdown to rewrite these links to go where originally intented.

Right now, to do this, I'm doing something awesome like:

const toMarkdown = require('to-markdown')
let toMarkdownMarkdownConvertors = require('to-markdown/lib/md-converters')

const GOOGLE_LINK_CONVERTOR = {
    filter: "A",
    replacement: CRAZY_FUNCTION_THAT_SHOOTS_UGLY_LINKS
}

// Replace default behaviour: https://github.com/domchristie/to-markdown/blob/55d8f2a5c3b98d0e16f31b7ae4b02f344fbf7e2e/lib/md-converters.js#L64-L72
toMarkdownMarkdownConvertors[7] = GOOGLE_LINK_CONVERTOR

With the current behaviour there is really no way to override the defaults.

I'm not sure what the best way to add this behaviour would be ... maybe just some way to punch out the defaults!

PS. Thanks for the awesome lib! ♥️

domchristie commented 8 years ago

Thanks for the awesome lib!

Thanks!

I think the converters option should do what you need. Something like:

const toMarkdown = require('to-markdown')

toMarkdown(STRING_OF_HTML, {
  converters: [{
    filter: 'a',
    replacement: CRAZY_FUNCTION_THAT_SHOOTS_UGLY_LINKS
  }]
})

The converters API isn't great, and it is definitely something I want to improve upon, but it does what you need. Any converters you specify will get prepended to the default converters. When the HTML is processed, the library will stop at the first converter with a matching filter, so in this case, the default converter for an anchor node will never get called.

Hope that helps :)

johnboxall commented 8 years ago

Any converters you specify will get prepended to the default converters.

Oh! I totally misread the code – I thought the converters supplied in the option were run after the default convertors, but they are actually run before.

That makes total sense and means I can remove this super-hack.

Thank you!

domchristie commented 8 years ago

Oh! I totally misread the code – I thought the converters supplied in the option were run after the default convertors, but they are actually run before.

Yeh, sorry, it's not the clearest!

I think it'd be nice if the converters were public e.g. toMarkdown.converters, and/or possibly have some way to reference them directly e.g. toMarkdown.converterFor('a').