Closed fakedrake closed 10 years ago
Mystery solved
$ df /scratch Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/vg_system-scratch 639957424 607442764 0 100% /scratch
The data takes up about 3-4x the pure data size bacause I keep all steps of the process zipped xml dumps -> xml dumps -> sql dumps -> database
, in case one of them goes wrong. Since space is obviously an issue (at least in the development environment of wikipedia-mirror) i will definitely need to delete some of them.
Sql dumps are the most expensive to generate so I guess those are out of the question I think. Xmls are pretty cheap so those will probably go. I will try to make it work keeping the zipped ones .
With enough space it works:
$ curl -I http://futuna.csail.mit.edu:8080/mediawiki/San_Diego_Boca_FC HTTP/1.1 200 OK Date: Thu, 08 May 2014 00:57:39 GMT Server: Apache X-Frame-Options: SAMEORIGIN Cache-Control: max-age=0, no-cache Content-Type: text/html; charset=UTF-8
Now all data is in the database all we need is the wikipedia extensions to do the rendering correctly.
I checked this morning the output and the
tmux
terminal buffer was filled with thisHowever all the dumps look like they succeeded
I tried a couple of articles and it works (see issue #6 )
But we are definitely missing stuff...
For brevity: San Diego FC
First milestone for this is to find out exactly how many articles we are missing.