thoni56 / dokuwiki2wikijs

31 stars 8 forks source link

dokuwiki2wikijs

This script does a crude export and conversion of all current pages and media files of a Dokuwiki installation and prepares it for import to Wiki.js.

It converts all current pages to Markdown using pandoc, stores them in the same structure as the dokuwiki installation and then zips that structure, making all converted pages and media ready for transfer to a Wiki.js installation.

Features

Conversion caveats

Markdowku

We had the markdowku plugin activated, which seemed to be a good idea when editing pages. So we had both Dokuwiki pages and Markdown pages.

This script decides how to handle the actual conversion based on the first character in the page. If it's a # the page is not sent for pandoc conversion, but kept as-is, assuming it is already Markdown.

What's more, the markdowku plugin is implemented in such a way that it is possible to mix markdown and dokuwiki formatting in the same page. So pages may contain a mix of Markdown and Dokuwiki syntax and still render correctly in Dokuwiki. But it trashes conversion, e.g. Markdown unordered lists (line starting with a dash) is not "legal" dokuwiki, and are thus considered text by pandoc -f dokuwiki .... This is impossible to handle, unless you re-implement the full dokuwiki content parsing pipeline. (Although piggy-backing directly on the actual dokuwiki source might be the "correct" way to do this conversion...)

This script makes no attempt to handle or signal this situation. Instead we decided to fix most of these on the source side by either reverting to pure Dokuwiki or ensuring the whole page was Markdown only. This might be prohibitive if you have a huge site with mixed pages.

As the editor always inserts Dokuwiki formatted page links, even "well formed" markdown pages will contain dokuwiki links. Thus, the script converts dokuwiki links in markdown-pages to markdown format to mimimize manual conversion work.

Folder page files

Dokuwiki have multiple ways to find the page to show for a folder "node":

We were thinking about automatically fixing this in the script. But there might actually be conflicts here, so we decided on manually adjusting the original structure so that all the occurences used a consistent, and also the (upcoming) Wiki.js, way, namely 'folder.txt' parallel to 'folder'.

Page titles

Page titles with only numbers don't import. We decided to fix this at the source too.

Page authors

Importing pages through the file storage will set author to the person logged in when doing the import. No author information will be transfered when using this script.

Links spanning multiple lines

Although dokuwiki have no problem handling links that are broken over two lines, e.g between spaces in the text, this script cannot handle that.

Limitations

Prerequisites

Usage

Single page conversion

To see the resulting conversion of a single page run the script with the page as the argument, e.g.

> dokuwiki2wikijs /var/www/dokuwiki/data/pages/start.txt

The rendered result will be output to standard output and can be piped to a file.

Installation conversion

To convert a complete installation run the script with the root of the installation as argument, e.g.

> dokuwiki2wikijs /var/www/dokuwiki

You then get a zip-file with the converted pages and media in your current directory.

Setup a Local File System storage in your Wiki.js installation. Watch out for where Wiki.js can actually create the directory, /tmp/wiki is a safe bet. Unpack the zip in a Wiki.js in that directory. If you are running Wiki.js in a Docker container these are the steps to do that:

> docker cp dokuwiki2wikijs.zip <container>:/tmp
> docker exec -it <container> /bin/bash
# cd <file storage folder>
# unzip /tmp/dokuwiki2wikijs.zip

Wanted features

Alternate approaches

Just converting the current version of pages and then import them through the file storage mechanism with all pages created by the one running the import, did not feel sufficient. So here are a number of alternate approaches we explored.

dokuwiki2git

At one point we tried to build upon dokuwiki2git since Wiki.js has an option to use git as a backing store. However it turned out that if you just hook up the repo it would just import the latest version of every page anyway. Neither did it use the user info from the commits, all pages would be created by whoever did the import anyway. We could not find any benefits over a simple file storage, so we went back to just handling the current version of the pages.

Another problem with using dokuwiki2git as the base is that it did not handle page moves. Instead it gets confused about (older) files that are no longer available. To do this correctly you need to unwind the history of a page in the same way as dokuwikis own history function does.

So we just scrapped that idea for our migration attempt and was satisfied with importing the last version. There is a branch use_docuwiki2git that holds the code from that attempt.

The major benefit that we thought that we could get from git approach, was importing history and author information. It did not.

Wiki.js GraphQL API

We thought that to do a real conversion, with history and author information, we need to go through the API. It's a GraphQL API and doesn't look to hard. Once you understand how to explore and use such an API...

It turned out that the API does not allow you to set the author either. The only way, currently, to set author is through database operations.

So the GraphQL API gave no extra benefits.

Database import

This seems to be the only real option if you want to keep author and revision information. We have not explored this since it requires detailed information about the database which does not seem to be available at this time.