outofcontrol / mediawiki-to-gfm

Converts Mediawiki format to Github Flavoured Markdown format
85 stars 21 forks source link

First version of each page exported instead the last #4

Closed mariancerny closed 5 years ago

mariancerny commented 6 years ago

In my case, the tool has exported the first version of each page instead of the last version. So old versions of pages have been exported.

I have done the mediawiki.xml export from MediaWiki 1.29.

To fix the problem I have changed the following line in convertData():

$text = $this->cleanText($text[0], $fileMeta);

to:

$text = $this->cleanText(end($text), $fileMeta);
mariancerny commented 6 years ago

I have now noticed that the instructions mention exporting only the current version:

  1. Check: 'Include only the current revision, not the full history' Note: This convert script will only do latest version, not revisions.

However, I have exported the pages from MediaWiki's maintenance script:

php dumpBackup.php --full > ~/mediawiki.xml

I have found instructions on how to export it elsewhere.

I think exporting the last revision instead of the first would be more universal. Should I send a pull request?

outofcontrol commented 6 years ago

The script wasn't designed to import from the output of dumpBackup.php. Only from the Special Pages -> Export. If you have a way of allowing both export versions to work without breaking how it works now, the PR is welcome.

gagarine commented 5 years ago

You may be interested in https://github.com/peterjc/mediawiki_to_git_md . Apparently the script keep the history (series of git commit). I didn't test it yet.