lonekorean / wordpress-export-to-markdown

Converts a WordPress export XML file into Markdown files.
MIT License
1.07k stars 216 forks source link

Code blocks not recognised when using Wordpress Blocks #86

Closed mxro closed 6 months ago

mxro commented 1 year ago

For some posts, code blocks seem to be not recognised. For instance, the following input from the XML export:

<!-- wp:paragraph -->
<p>CSS Modules are simply plain CSS files we can develop alongside our React components:</p>
<!-- /wp:paragraph -->

<!-- wp:syntaxhighlighter/code {"language":"css"} -->
<pre class="wp-block-syntaxhighlighter-code">.myclass {
  padding: 10px;
}</pre>
<!-- /wp:syntaxhighlighter/code -->

Results in the following Markdown output:

CSS Modules are simply plain CSS files we can develop alongside our React components:

.myclass {
  padding: 10px;
}

The pre is not converted into a code block in Markdown.

Note that converting into code seems to work when Wordpress posts are provided as plain HTML (without additional comments injected due to using new blocks feature). E.g. the following

<pre><code class="language-bash">yarn add next-optimized-images file-loader img-loader url-loader ignore-loader extracted-loader next-compose-plugins
</code></pre>

Is correctly translated into:

```bash
yarn add next-optimized-images file-loader img-loader url-loader ignore-loader extracted-loader next-compose-plugins
mxro commented 1 year ago

Workaround

Change all of the wonky ways in which Wordpress allows defining source code in posts in the XML file before processing it with wordpress-export-to-markdown:

wordpressPreprocess.ts

Sorry for the ugly code - I try to avoid being too clever with regular expressions. Therefore there is a fair amount of repetition.

bahree commented 1 year ago

Sorry if this is a stupid question - where do I call this file from?

mxro commented 1 year ago

No stupid question at all, since that file was not written to be run independently but just a code reference, possibly to be integrated into this library.

However, it should be relatively easy to run it. Here a modified file for this purpose

import { readFileSync, writeFileSync } from 'fs';

export function wordpressPreprocess(input: string): string {
  let pattern = /<!-- wp:syntaxhighlighter\/code {"language":"([^"]*)"} -->\s*<pre class="wp-block-syntaxhighlighter-code">/g;
  let res = input.replace(pattern, '<pre><code class="language-$1">');

  pattern = /<!-- wp:syntaxhighlighter\/code -->\s*<pre class="wp-block-syntaxhighlighter-code">/g;
  res = res.replace(pattern, '<pre><code>');

  pattern = /<\/pre>\s*<!-- \/wp:syntaxhighlighter\/code -->/g;
  res = res.replace(pattern, '</code></pre>');

  pattern = /\[sourcecode language="([^"]*)"\]/g;
  res = res.replace(pattern, '<pre><code class="language-$1">');

  pattern = /\[sourcecode language='([^']*)'\]/g;
  res = res.replace(pattern, '<pre><code class="language-$1">');

  pattern = /\[sourcecode language=([^\]]*)\]/g;
  res = res.replace(pattern, '<pre><code class="language-$1">');

  pattern = /\[sourcecode\]/g;
  res = res.replace(pattern, '<pre><code>');

  pattern = /\[\/sourcecode\]/g;
  res = res.replace(pattern, '</code></pre>');

  pattern = /\[code lang="([^"]*)"\]/g;
  res = res.replace(pattern, '<pre><code class="language-$1">');

  pattern = /\[code lang='([^']*)'\]/g;
  res = res.replace(pattern, '<pre><code class="language-$1">');

  pattern = /\[code lang=([^\]]*)\]/g;
  res = res.replace(pattern, '<pre><code class="language-$1">');

  pattern = /\[code language="([^"]*)"\]/g;
  res = res.replace(pattern, '<pre><code class="language-$1">');

  pattern = /\[code language='([^']*)'\]/g;
  res = res.replace(pattern, '<pre><code class="language-$1">');

  pattern = /\[code language=([^\]]*)\]/g;
  res = res.replace(pattern, '<pre><code class="language-$1">');

  pattern = /\[code\]/g;
  res = res.replace(pattern, '<pre><code>');

  pattern = /\[\/code\]/g;
  res = res.replace(pattern, '</code></pre>');

  pattern = /<code class="language-jscript"/g;
  res = res.replace(pattern, '<code class="language-typescript"');

  return res;
}

export function wordpressPreprocessFile(
  xmlFilePath: string,
  destFilePath: string
): Promise<void> {
  const input = readFileSync(xmlFilePath, 'utf8');
  const res = wordpressPreprocess(input);
  writeFileSync(destFilePath, res, 'utf8');
}

wordpressPreprocessFile('input.xml', 'output.xml`);

Fix the file path in the last line, save this file as wordpressPreprocess.ts and then run with:

npx ts-node wordpressPreporcess.ts
stillnet commented 1 year ago

I am running into the same issue. code blocks are not coming through in the markdown. I tried the above script (I had to make some changes to get it to run, not sure if I did them right) but my code content is still not coming through.

I tried editing the wordpress xml export manually, like changing to [code], but that did not help. If anyone has suggestions on what else to try, that would be great. Thanks.

lonekorean commented 6 months ago

@mxro thank you so much for the helpful description and examples!

Fixed in v2.2.6.