laurent22 / joplin-turndown

MIT License
8 stars 6 forks source link

[BUG] Turndown throws an error attempting to parse HTML generated by Safari and Chrome when copying plain text files #11

Closed nathanlesage closed 4 years ago

nathanlesage commented 4 years ago

I'm referring to the following issue that has been opened on the Zettlr issue tracker: https://github.com/Zettlr/Zettlr/issues/693


Apparently, there is a case where Turndown fails to turn down HTML with a console.error if you copy plain text from within Safari or Google Chrome and attempt to feed the produced HTML content from the clipboard to Turndown:

Uncaught TypeError: Cannot read property 'value' of undefined
    at isCodeBlockSpecialCase2 (/node_modules/joplin-turndown/lib/turndown.cjs.js:191)
    at Object.filter (/node_modules/joplin-turndown/lib/turndown.cjs.js:198)
    at filterValue (/node_modules/joplin-turndown/lib/turndown.cjs.js:734)
    at findRule (/node_modules/joplin-turndown/lib/turndown.cjs.js:722)
    at Rules.forNode (/node_modules/joplin-turndown/lib/turndown.cjs.js:707)
    at TurndownService.replacementForNode (/node_modules/joplin-turndown/lib/turndown.cjs.js:1242)
    at /node_modules/joplin-turndown/lib/turndown.cjs.js:1207
    at NodeList.reduce (<anonymous>)
    at TurndownService.process (/node_modules/joplin-turndown/lib/turndown.cjs.js:1192)
    at TurndownService.turndown (/node_modules/joplin-turndown/lib/turndown.cjs.js:1065)

This error is exactly the same for both Google Chrome (current version) and Safari (current version).

Reproducing

Open any code file (that is, not an HTML file) in Safari or Chrome, copy everything into the clipboard, and feed that generated HTML into Turndown. Tested with this one: https://necolas.github.io/normalize.css/latest/normalize.css

Firefox does not expose that behaviour. However, the pre-tag is also not correctly parsed, as the text is treated literal (that is: stars in the example of normalize.css are escaped and the whole thing is not encapsulated with backticks)

Firefox clipboard

<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body><pre>

/* Here the file contents, the spacing is added for legibility */

</pre></body></html>

Chrome clipboard

<meta charset='utf-8'><pre style="color: rgb(0, 0, 0); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial; overflow-wrap: break-word; white-space: pre-wrap;">

/* Here the file contents, the spacing is added for legibility */

</pre>

Safari clipboard

<pre style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; white-space: pre-wrap;">

/* Here the file contents, the spacing is added for legibility */

</pre>

I believe it's because the previous checks in said function pass, but the find-function does not work:

// To handle PRE tags that have a monospace font family. In that case
// we assume it is a code block.
function isCodeBlockSpecialCase2(node) {
  if (node.nodeName !== 'PRE') return false;

  const style = node.getAttribute('style');
  if (!style) return false;
  const o = css.parse('pre {' + style + '}');
  if (!o.stylesheet.rules.length) return;
  const fontFamily = o.stylesheet.rules[0].declarations.find(d => d.property.toLowerCase() === 'font-family'); // This "find" seems to return undefined, because there's a lot of "font" properties, but no "font-family" in said example code
  const isMonospace = fontFamily.value.split(',').map(e => e.trim().toLowerCase()).indexOf('monospace') >= 0;
  return isMonospace;
}

Thank you very much for this awesome library, it made working with Zettlr a thousand times better :) If you need any more additional information from me, please don't hesitate to ask, and I'll see to provide it quickly!

All the best!

laurent22 commented 4 years ago

@nathanlesage, this issue should be fixed in this commit 6a8299c2d787d9e5fe22882b845143e59685e1aa and indeed the problem was that the code was incorrectly assuming that a font family would be defined. So upgrading to joplin-turndown@4.0.29 should fix the issue.

nathanlesage commented 4 years ago

@laurent22 Thank you very much for your work on this! Will do! :)