opensourcecatholic / opensourcecatholic.github.io

Jekyll codebase behind Open Source Catholic
http://www.opensourcecatholic.com/
MIT License
36 stars 9 forks source link

Migrate all Drupal 7 content to static markdown files #1

Closed geerlingguy closed 8 years ago

geerlingguy commented 8 years ago

As part of the transition to Jekyll from Drupal 7, we need to export every single node on the site (each blog entry, book page, forum topic, page, and project) to a static markdown file.

A few important parts in this process:

  1. We need the jekyll-redirect-from plugin, and we need to add the URL alias(es) for each node in that node's .md file's front matter.
  2. We need to figure out the simplest way of migrating comments somehow (especially for forum topics). A lot of great content was written in the comments, and I don't want it lost!
  3. We need to figure out the best way of dumping images attached to nodes so they're also accessible.
  4. We'll need to normalize all the fields in all the content types so they flow into one new 'post' type in Jekyll (unless we want to work with experimental 'collections'?).
  5. We should start disabling the more dynamic parts of the site (e.g. forms, search, login).
  6. We need to figure out a way to keep posts attributed to their original authors. We can't map users in Drupal straight into users in Jekyll... so we'll need to figure out a way to configure the authors correctly in Jekyll. Something like this could do it.

There will be some pages that will be hard to fix properly with Jekyll (like pagers for listings), but we'll figure out a way :)

iloveitaly commented 8 years ago

@geerlingguy thanks for kicking this off!

Any chance you could try running the drupal > jekyll migration tool? http://import.jekyllrb.com/docs/drupal7/

Curious to see how much detail the stock tool handles. Once we've run that tool, we can investigate what information will need to be manually migrated.

geerlingguy commented 8 years ago

It looks like we might want to fork that migration tool so we can do the following:

After that, maybe we could migrate comments to Disqus and figure out a way to do the node ID mapping in Jekyll so the comments are attached to the right nodes?

geerlingguy commented 8 years ago

I think, to limit complexity, I'm going to first merge all posts on OSC (in Drupal) into 'blog' posts, then we'll do the import. If everything's a blog post, it'll require far fewer modifications to the default Drupal 7 importer.

geerlingguy commented 8 years ago

I'll hopefully have some time to take an initial stab at the migration during Downton Abbey tonight (wife loves it, I was kind of over it after 2nd season, ha!). We'll see how far I get! I'll likely toss any code I'm working on into a separate branch (unless I'm lazy, then I'll throw it into master) once I get something workable.

aaronkavlie-wf commented 8 years ago

If there's anything that can be split off into a separate task without dependencies at any point, put it in another issue and let me know. I may be able to pick something up.

Warning: no Drupal, Jekyll or Ruby experience. But at least familiar with other similar tools and languages

iloveitaly commented 8 years ago

Quick thought here: if we are able to get the migration 90% complete it might be worth handing it off to a virtual assistant via oDesk to handle the remainder manually.

Great to have you here @aaronkavlie-wf!

geerlingguy commented 8 years ago

@iloveitaly - Sounds good to me (perfect is the enemy of good), and I'll see how far I can get tonight.

@aaronkavlie-wf - There will surely be plenty of stuff to do (especially cleanup or tweak-related), so just watch the repo, check the site, and see where you want to help out!

geerlingguy commented 8 years ago

Current queries I'm running against the D7 database to prep it for the migration:

UPDATE node SET type = 'blog' WHERE type IN('project','forum','book');
UPDATE node SET sticky = 0 WHERE nid IN(86, 13, 15);
UPDATE node_revision SET sticky = 0 WHERE nid IN(86, 13, 15);
geerlingguy commented 8 years ago

Some of the stuff is working... but I need to add the redirect links too; will work on that next. This is a lot simpler than I was originally expecting! (Though Ruby versions and such are causing plenty of consternation.)

geerlingguy commented 8 years ago

Main stuff is done, nids are present so automated tasks to clean up things any further can be done, and I've opened up #10 as a follow-up. But it's getting late, and I'm going to call it a night. Hopefully I'll get more time to work on this later, but already made quite a bit of progress!

iloveitaly commented 8 years ago

:+1: awesome work!

On Jan 3, 2016, at 11:30 PM, Jeff Geerling notifications@github.com wrote:

Closed #1.

— Reply to this email directly or view it on GitHub.

geerlingguy commented 8 years ago

...and for posterity, the actual Ruby script I used to do the initial import/migration was:

Gemfile:

source 'https://rubygems.org'
gem "sequel"
gem "mysql"
gem "jekyll"
gem "jekyll-import", :git => "https://github.com/geerlingguy/jekyll-import.git", :branch => 'drupal7'

Import.rb:

require "jekyll-import";

JekyllImport::Importers::Drupal7.run({
  "dbname"   => "osc",
  "user"     => "osc",
  "password" => "7yM93VeAXMGY",
  "host"     => "127.0.0.1",
  "prefix"   => ""
})

Directions for use:

  1. Install Ruby and Bundler.
  2. Run bundle install.
  3. Run bundle exec ruby import.rb.

Note: Requires Ruby 2.x, which can be installed on Ubuntu 14.04 with:

sudo apt-add-repository ppa:brightbox/ruby-ng
sudo apt-get update
sudo apt-get install ruby2.2 ruby2.2-dev
ruby2.2 -v
geerlingguy commented 8 years ago

And, in terms of the custom gem for jekyll-import, I decided to make that a PR against the main project: https://github.com/jekyll/jekyll-import/pull/237

It would also be nice to include the original path alias too... but I don't assume everyone migrating from Drupal 7 would either (a) always have a path alias available for each node, or (b) want to use jekyll-redirect-from.