hexojs / hexo-migrator-wordpress

WordPress migrator for Hexo.
http://hexo.io/docs/migration.html
MIT License
47 stars 29 forks source link

fix: retain paragraph #79

Closed curbengh closed 4 years ago

curbengh commented 4 years ago

Fixes https://github.com/hexojs/hexo-migrator-wordpress/issues/35

In WP classic editor, exported post doesn't have <p> which is required by turndown to retain newline; without <p> (or any other element), turndown will remove newline https://github.com/domchristie/turndown/issues/264.

I'm assuming most users used the modern editor, hence this workaround is not applied by default. To enable:

$ hexo migrate wordpress exported.xml --paragraph-fix
coveralls commented 4 years ago

Coverage Status

Coverage increased (+0.09%) to 95.862% when pulling c3ffa5b8ff1baed412071b669b39e133567964b1 on curbengh:restore-paragraph into b5b25653aa25e2052c35ebca93ab596fdf086b2d on hexojs:master.

jehy commented 4 years ago

Code blocks are usually inserted via external plugins like WP-Syntax, so there can be different use cases... My code blocks use WP-Syntax,are added with <pre lang="php"> and look like this:

<content:encoded><![CDATA[
<p>Data base of wikimedia-based project in several monthes can gain awful size. Since there are no solutions from wikimedia itself, but you can use wonderful plugin "SpecialDeleteOldRevisions", который эту функциональность обеспечивает. It helps you to delete articles, filtering by<UL></p>
<li>Article Category</li>
<li>Revision creation time</li>
<li>Article name</li>
<p></UL>Also you have an option - if you want to delete deleted articles from database or not. I checked it on <a href="https://jehy.ru/wiki">my wiki</a> - everything works wonderful. But, as always, after some bugfix. I made this work and published fixed version.</p>
<p><a href="http://www.mediawiki.org/wiki/Extension:SpecialDeleteOldRevisions">Original plugin page</a></p>
<p><a href="https://jehy.ru/dload/specialdeleteoldrevisions.zip">My patched version for wiki 13.2</a></p>
<p>To install plugin, copy it's directory "SpecialDeleteOldRevisions" to your "/extensions", and add to LocalSettings.php the following lines:</p>
<pre lang="php">
 $wgGroupPermissions['sysop']['DeleteOldRevisions'] = true;
 include_once('extensions/SpecialDeleteOldRevisions/SpecialDeleteOldRevisions.php');
</pre>
<p>After it in "special pages" you will see new link - "Delete old revisions" - use it. And better make backup firstly ;).</p>
]]></content:encoded>
curbengh commented 4 years ago

The sample you gave has <p> which I believe is from modern editor? Can you try create another post with classic editor?

jehy commented 4 years ago

The sample you gave has <p> which I believe is from modern editor? Can you try create another post with classic editor?

I've written posts with wordpress for 14 years, and I've used different post formats, and I'm not even sure what was written in editor, what I coded in html, and where I used post formatting plugins (you know, it is even possible to write wordpress posts in markdown...).

So I suppose that my posts are not the best source for exploring. May be we need a new clean wordpress installation to deal our best with default formatting.

curbengh commented 4 years ago

From a sample provided by @adnan360, the post Post with Image (Classic Editor) doesn't have <p> which causes https://github.com/hexojs/hexo-migrator-wordpress/issues/35. Looks like Classic Editor refers to this plugin.

adnan360 commented 4 years ago

Looks like Classic Editor refers to this plugin.

Yes, you're right. WP has phased out the classic editor in favor of the new Gutenberg editor since v5. But for backwards compatibility they have kept it supported with this plugin in case something breaks, or someone needs it.

I have added some inline code, code blocks and block quote (to test) into the posts on a new gist here. Unfortunately, classic editor does not have a "code" button to create a code block. So I manually went into "Text" mode and typed in code within <pre> tag. Everything else is same as previous gist I shared.

curbengh commented 4 years ago

Unfortunately, classic editor does not have a "code" button to create a code block. So I manually went into "Text" mode and typed in code within <pre> tag.

so the workaround !/<pre>/i.test(str) wouldn't work unless <pre> is manually inserted before running this plugin. I removed the workaround. It would be easier for users to fix (i.e. add ```) after import.

adnan360 commented 4 years ago

Let me know if I'm missing something. But I've tried this branch (npm install curbengh/hexo-migrator-wordpress#restore-paragraph --save) and this is what I got:

wp-new-line-01

On the left the Hexo site shows the lines on one paragraph, on the right you can see the original post (in Text mode) to show that there is a new line. But Hexo is still showing the lines together.

HTML source on Hexo page shows:

<p>This is a post written in classic WP editor. So this is the excerpt before more tag.</p>
--
  | <a id="more"></a>
  | <p>This is the post body after more tag. This is an <code>inline code</code> example on the classic editor. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Justo nec ultrices dui sapien. ...
curbengh commented 4 years ago

The workaround is not enabled by default, need to

$ hexo migrate wordpress exported.xml --paragraph_fix
adnan360 commented 4 years ago

The workaround is not enabled by default, need to

$ hexo migrate wordpress exported.xml --paragraph_fix

OK. Working now with the parameter. I think we should change parameter syntax from --paragraph_fix to --paragraph-fix. Most cli programs I use follow this rule.

curbengh commented 4 years ago

I think we should change parameter syntax from --paragraph_fix to --paragraph-fix

Updated. I will also update --import_image to --import-image.

curbengh commented 4 years ago

Seems alright with codeblock.

const TurndownService = require('turndown');
const tomd = new TurndownService({ headingStyle: 'atx', codeBlockStyle: 'fenced' });

const paragraph_fix = true;

const md = str => {
  // #35
  if (paragraph_fix && !/<p>/i.test(str)) {
    str = '<p>' + str.replace(/(\r?\n){2}/g, '</p>\n\n<p>') + '</p>';
  }

  return tomd.turndown(str);
};
const content = `
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Aenean vel elit scelerisque mauris pellentesque. Dictumst quisque sagittis purus sit amet volutpat. Urna cursus eget nunc scelerisque viverra mauris in aliquam. Non enim praesent elementum facilisis leo vel fringilla est ullamcorper. Ultrices sagittis orci a scelerisque purus semper.

<pre><code>
const TurndownService = require('turndown');
const tomd = new TurndownService({ headingStyle: 'atx', codeBlockStyle: 'fenced' });

const paragraph_fix = true;

console.log(tomd);

</code></pre>

Quis blandit turpis cursus in hac. Massa enim nec dui nunc mattis enim ut tellus. Justo eget magna fermentum iaculis eu non. Facilisis gravida neque convallis a cras semper. Est velit egestas dui id ornare arcu odio ut sem. Justo eget magna fermentum iaculis eu.
`

console.log(md(content))
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Aenean vel elit scelerisque mauris pellentesque. Dictumst quisque sagittis purus sit amet volutpat. Urna cursus eget nunc scelerisque viverra mauris in aliquam. Non enim praesent elementum facilisis leo vel fringilla est ullamcorper. Ultrices sagittis orci a scelerisque purus semper.

```

const TurndownService = require('turndown');
const tomd = new TurndownService({ headingStyle: 'atx', codeBlockStyle: 'fenced' });

const paragraph_fix = true;

console.log(tomd);

```

Quis blandit turpis cursus in hac. Massa enim nec dui nunc mattis enim ut tellus. Justo eget magna fermentum iaculis eu non. Facilisis gravida neque convallis a cras semper. Est velit egestas dui id ornare arcu odio ut sem. Justo eget magna fermentum iaculis eu.