lonekorean / wordpress-export-to-markdown

Converts a WordPress export XML file into Markdown files.
MIT License
1.07k stars 216 forks source link

Add support for Enlighter code blocks #77

Closed drikusroor closed 6 months ago

drikusroor commented 2 years ago

To add support for Enlighter code blocks, I have added a turndown rule.

An Enlighter code block typically gets exported by Wordpress to the following format:

<!-- wp:enlighter/codeblock {"language":"typescript"} -->
<pre class="EnlighterJSRAW" data-enlighter-language="typescript" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">// foo equals "bar"
const foo "bar";</pre>
<!-- /wp:enlighter/codeblock -->

The selector therefore is a combination of the node name 'PRE' and the class 'EnlighterJSRaw'. The language can be found in the data-enlighter-language attribute. Lastly, instead of using the node.innerHTML property, I use the content property. If I use node.innerHTML, the < are escaped to &lt; and so on...

Output then becomes:

```typescript
// foo equals "bar"
const foo = "bar";
drikusroor commented 2 years ago

@lonekorean Would you have time to look at this perhaps? 😅

lancegoyke commented 1 year ago

This was super helpful to discover, @drikusroor. Thanks for sharing it.

I ran into a problem that my underscores _ were being escaped into \_. The default behavior of Turndown is to escape characters that may be confused with the Markdown syntax unless they are in a code block.

I was able to tell markdown we were in a code block by adjusting your code to use this line instead of the default content function parameter. Then we return the code instead of the content.

Now my underscores are no longer escaped!

// preserve enlighter code blocks
turndownService.addRule("enlighter", {
  filter: (node, options) => {
    return (
      options.codeBlockStyle === "fenced" &&
      node.nodeName === "PRE" &&
      node.firstChild &&
      node.classList.contains("EnlighterJSRAW")
    );
  },
  replacement: (content, node) => {
    const language = node.getAttribute("data-enlighter-language") ?? "";
    const code = node.textContent;
    return "\n" + "```" + language + "\n" + code + "\n" + "```" + "\n";
  },
});
lonekorean commented 6 months ago

@drikusroor thank you for the helpful example and PR! Sorry this took... a while.

Fixed in v2.2.6.

I ended up with a broader solution to also cover other cases with <pre>. I didn't merge this PR directly, but I did cherry-picked your commits to make sure you got contributor credit.