marcusklang / wikiforia

A Utility Library for Wikipedia dumps
GNU General Public License v2.0
33 stars 15 forks source link

null pointer exception executing the tool with bavarian wiki #7

Open renepickhardt opened 9 years ago

renepickhardt commented 9 years ago

hey Marcus I tried

git clone
mvn compile
mvn package

so far so good (ok I had a little trouble figuring out that the easiest way to respect external dependencies is switching to the target directory and running from there)

then

cd target
wget https://dumps.wikimedia.org/barwiki/20151002/barwiki-20151002-pages-articles-multistream-index.txt.bz2
wget https://dumps.wikimedia.org/barwiki/20151002/barwiki-20151002-pages-articles-multistream.xml.bz2

when I now run `java -jar wikiforia-1.2.1.jar -pages barwiki-20151002-pages-articles-multistream.xml.bz2 -output res.xml``

I receive the following output:

[2015-10-14 15:14:55.728 | main | INFO  | se.lth.cs.nlp.wikiforia.App] Wikiforia v1.2.1 by Marcus Klang
Exception in thread "main" java.lang.NullPointerException
    at se.lth.cs.nlp.mediawiki.parser.MultistreamBzip2XmlDumpParser.toString(MultistreamBzip2XmlDumpParser.java:480)
    at se.lth.cs.nlp.wikiforia.Pipeline.run(Pipeline.java:73)
    at se.lth.cs.nlp.wikiforia.App.convert(App.java:239)
    at se.lth.cs.nlp.wikiforia.App.main(App.java:413)

looking at https://github.com/marcusklang/wikiforia/blob/5672123ec7eb24801a40276c3e7083e977279838/src/main/java/se/lth/cs/nlp/mediawiki/parser/MultistreamBzip2XmlDumpParser.java#L480

I see that there must be some class fields not initialized but I didn't go into further debugging.

ls shows me that the file res.xml was created so I assume that passing arguments works and something else in the class field is not correctly set.

Did I do something wrong? Is the tool just not working with bavarian wikipedia? comparing git has I found this in git log

commit 04e80b46ecc1bb487419fb9f831258be78413f07
Author: Marcus Klang <marcus.klang@cs.lth.se>
Date:   Tue Mar 24 11:08:08 2015 +0100

    * Added French, German and Spanish configurations

which made me wonder that my dump could be the reason. Thanks for help! I am not particularly interested in the bavarian wikipedia but I wanted to test the tool with small data (:

best Rene

renepickhardt commented 9 years ago

Ok I found https://github.com/marcusklang/wikiforia/blob/bd1e9d4f3fd4bcaad0776bd58399a233b00a9d20/src/main/java/se/lth/cs/nlp/wikipedia/lang/BarConfig.java basically telling me that bavarian should be supported.

renepickhardt commented 9 years ago

yeah I have tried the german edition of wikipedia. it also doesn't work.

thesoulshell commented 8 years ago

I just tried english version of simple wikipedia and got the same result

tobymao commented 8 years ago

@thesoulshell you need to use the absolute path... /home/user/enwiki.xml.bz2