WP-MIRROR makes a mirror of a wikimedia site. It is written in common-lisp so that's refreshing, however:
The en wikipedia is the most demanding case. It should build in 1Ms (twelve days), occupy 3T of disk space, be served locally by a virtual host http://en.wikipedia.site/, and update automatically every month.
Removing the images and history should make it much better (wikipedia-mirror takes up ~250G).
At the very least we could replace the mwdumper with mediawiki-mwxml2sql here:(
NOTE: This is not the recommended method of importing XML dumps.
WP-MIRROR makes a mirror of a wikimedia site. It is written in common-lisp so that's refreshing, however:
Removing the images and history should make it much better (
wikipedia-mirror
takes up ~250G).At the very least we could replace the mwdumper with mediawiki-mwxml2sql here:(This probably renders wp-mirror unreliable.