When running make sql-dump-parts the script gets stuck on an empty xml file:
The errored article is ( /home/wikipedia-mirror/drafts/errored_articles ). Fixing... (time: Sun Mar 8 22:53:29 EDT 2015)
ERROR: empty xml file /home/wikipedia-mirror/drafts/wikipedia-parts/enwiki-20140614-pages-meta-current1.xml-p000000010p000010000.fix.xml
XML parse script failed. This is serous. report this at
http://github.com/fakedrake/wikipedia-mirror/issues
I didn't clean up after myself btw (rm /home/wikipedia-mirror/drafts/wikipedia-parts/enwiki-20140614-pages-meta-current1.xml-p000000010p000010000.fix.xml), will now exit 1
Finished 0th pass over /home/wikipedia-mirror/drafts/wikipedia-parts/enwiki-20140614-pages-meta-current1.xml-p000000010p000010000.fix.xml after removing
I dont know what to do. I tried this 3 itmes, each time removing the bad article
Try re-running the command if this is java.lang.ArrayIndexOutOfBoundsException to continue
Otherwise please report at
http://github.com/fakedrake/wikipedia-mirror/issues
The errored articles log is /home/wikipedia-mirror/drafts/errored_articles.
If it is empty or you see the same article more than once report your situation.
make: *** [/home/wikipedia-mirror/drafts/wikipedia-parts/enwiki-20140614-pages-meta-current1.xml-p000000010p000010000.sql] Error 1
Unfortunately, I don't know what article this is because both /home/wikipedia-mirror/drafts/errored_articles and the xml file itself (/home/wikipedia-mirror/drafts/wikipedia-parts/enwiki-20140614-pages-meta-current1.xml-p000000010p000010000.fix.xml) are empty files.
What I've tried/looked at:
I've already tried re-running make sql-dump-parts both with and without deleting the empty xml file from the disk, and I get the same error. Unless there's a better way around this, I would like to skip this xml file.
In Makefile.smartdumps you mention
Use ARTICLES_TO_REMOVE as a list of articles you know will fail to remove them before hand.
However, I don't see any way to use ARTICLES_TO_REMOVE, and the closest thing I could find is xml-parse.sh, yet I think it's xml-parse.sh itself that's complaining in the first place -- and further, I don't know the "TITLE_OF_ARTICLE_TO_REMOVE" (apparently, an argument needed for xml-parse.sh).
Any way to ignore this one XML file and continue the sql-dump-parts?
When running
make sql-dump-parts
the script gets stuck on an empty xml file:Unfortunately, I don't know what article this is because both
/home/wikipedia-mirror/drafts/errored_articles
and the xml file itself (/home/wikipedia-mirror/drafts/wikipedia-parts/enwiki-20140614-pages-meta-current1.xml-p000000010p000010000.fix.xml
) are empty files.What I've tried/looked at: I've already tried re-running
make sql-dump-parts
both with and without deleting the empty xml file from the disk, and I get the same error. Unless there's a better way around this, I would like to skip this xml file.In
Makefile.smartdumps
you mentionHowever, I don't see any way to use ARTICLES_TO_REMOVE, and the closest thing I could find is
xml-parse.sh
, yet I think it'sxml-parse.sh
itself that's complaining in the first place -- and further, I don't know the "TITLE_OF_ARTICLE_TO_REMOVE" (apparently, an argument needed forxml-parse.sh
).Any way to ignore this one XML file and continue the sql-dump-parts?