hexojs / hexo-migrator-wordpress

WordPress migrator for Hexo.
http://hexo.io/docs/migration.html
MIT License
48 stars 29 forks source link

Idea: import html, not source code #87

Closed jehy closed 4 years ago

jehy commented 4 years ago

I looked through existing issues and I think they have a common problem - we try to import post source, not generated html content.

It may seem same, but most of WordPress blogs have plugins that add custom tags, custom ways of formatting and so on.

May be we should use rendered post in html instead of source code because we simply can't make importer that will know about all plugins - but we can make importer that simply correctly imports all html to markdown.

It is possible if we make WordPress plugin with custom export feature - it could apply all changes to post body.

What do you think about it?

curbengh commented 4 years ago

I tried it before, but

most of WordPress blogs have plugins that add custom tags, custom ways of formatting and so on.

the plugins actually make it much harder to import. For example, things like site categories, header/footer links, author's profile and so on are all inside <body>. Post source is used because it's clean, there is no other stuff, other than the post itself.

jehy commented 4 years ago

I don't mean importing full HTML page, I mean importing processed post only. Categories, header\footer links and other staff isn't rendered in post body.

What we have right now in XML export is post body source. Later, a lot of filters and action are applied on this source to make HTML post body - for example it may be custom processing of code snippets, custom tags or even generating html from markdown. So we can try using post body html instead of post body source.

curbengh commented 4 years ago

Categories, header\footer links and other staff isn't rendered in post body.

I can test it, do you have a sample HTML?


Another approach I thought about is having an option to import the whole HTML, without conversion to markdown. Similar to WP2Static and Sitesauce.

User still need to supply XML so that this plugin knows which post/page to download, it works similarly to image attachment download.

curbengh commented 4 years ago

It may seem same, but most of WordPress blogs have plugins that add custom tags, custom ways of formatting and so on.

In retrospect, custom tags are not compatible with theme. It's possible to put raw HTML in posts, disable their layout and use custom CSS, but this is way too complicated for the users and doesn't fit well with the Hexo ecosystem.

jehy commented 4 years ago

It's possible to put raw HTML in posts, disable their layout and use custom CSS, but this is way too complicated for the users and doesn't fit well with the Hexo ecosystem.

That's not complicated - we need just one custom wordpress export plugin.

curbengh commented 4 years ago

Assuming that wordpress plugin exports rendered HTML, then this plugin import that HTML as is, finally the imported posts will be in HTML format?