A Python script that makes migrating from WordPress to Jekyll as painless as possible.
I wrote this script after going through the painful process of manually migrating my WordPress blog to Jekyll because no existing migration tools met my needs.
This CLI script takes a WordPress export XML file and:
html2text
wp2jekyll does a few things that Jeyll Eporter WordPress plugin by Ben Balter does not:
How the Front Data is used depends on the Jekyll theme in use. The Front Data
generated by wp2jekyll
is designed for use with the Chirpy theme, but
many other themes use the same variable names, so it should work as-is for
other themes too.
Variable | Description |
---|---|
layout |
The layout to use (i.e., post , or page ) |
permalink |
Sets the URI of the post or page so it matches the WordPress one |
title |
The post title |
seo_title |
The SEO title from Yoast SEO - not currently used by any Jekyll theme |
description |
The description from the WordPress or Yoast SEO description fields |
date |
The GMT timestamp of the post or page in YYYY:MM:DD HH:MM SS -0000 format |
last_modified_at |
The GMT modification timestamp of the post or page in YYYY:MM:DD HH:MM SS -0000 format |
image: path : alt : |
Path to the featured image and optional alt text |
publish |
Sets if the post or page is published (True or False ) |
pin |
Sets if the post or page is pinned (True or False ) |
categories |
A list of categories |
tags |
A list of tags |
_data/authors.yaml
- A mapping of WordPress login names to display names_assets/wp-content/uploads
- Files uploaded to WordPress_posts/
- Posts converted to Markdown format_pages/
- Pages converted to Markdown format_wp_html/pages
- WordPress content in the original HTML_wp_html/posts
- WordPress content in the original HTMLPython >= 3.2
Install the Python dependencies before using wp2jekyll
.
python3 -m pip install -r requirements.txt
Navigate to Tools> Export in the WordPress admin console to export the WordPress blog content to XML.
Pass the path to this file to wp2jekyll
, along with any desired options.
For example:
python3 wp2jekyll.py seanthegeeknet.WordPress.2024-08-12.xml
wp2jekyll
uses html2text
to convert the WordPress HTML content to Markdown.
It's not perfect. Here are some issues I ran into.
Markdown does not support nested tables, but Markdown does support HTML inside of a Markdown document. If you have a document with a nested table, replace the entire (i.e., outer) table with the HTML in the original post or page.
In HTML, iframe
tags are used to embed content from other websites, such as
YouTube. You will unfortunately need to copy and paste iframe
content from
the original HTML into the Markdown document.
WordPress supports oEmbed, which allows a post author to embed social media content just by placing the URL in the post body. oEmbed takes care of generating the HTML to actually embed that content. Unfortunately, Jekyll does not support oEmbed, and plugins that add oEmbed support have not been maintained to be compatible with modern Jekyll.
So, look for social media URLs that are on their own, and replace them with the proper HTML. The social media networks will provide the embed HTML as a sharing option.
usage: wp2jekyll.py [-h] [--version] [--output OUTPUT] [--include-author] [--no-downloads] [--no-url-rewrites] [--no-permalinks] xml_path
positional arguments:
xml_path the path to the WordPress export XML file
options:
-h, --help show this help message and exit
--version show program's version number and exit
--output OUTPUT the directory to output to (default: .)
--include-author include the author in the Front Data (default: False)
--no-downloads do not attempt to download media files (default: False)
--no-url-rewrites do not rewrite media URLs (default: False)
--no-permalinks do not retain the original permalinks (default: False)
--no-cleanup do not clean up the converted Markdown content (default: False)