shilad / wikibrain

The WikiBrain Java library enables researchers and developers to incorporate state-of-the-art Wikipedia-based algorithms and technologies in a few lines of code.
http://shilad.github.io/wikibrain/
Other
91 stars 55 forks source link

Unknown langCode: '' after downloading wikipedia articles #271

Open saisubramaniam opened 7 years ago

saisubramaniam commented 7 years ago

Hi,

While installing Wikibrain (SR only) with the full English language, I encountered the below error. The args during installation: java -Xmx80g -cp wikibrain-withdeps-0.8.0.jar org.wikibrain.Loader org.wikibrain.Loader -l en -s fetchlinks -s download -s dumploader -s redirects -s wikitext -s lucene -s phrases -s sr

The error:

06:52:45.937 [main] INFO org.wikibrain.download.DumpFileDownloader - 28 files downloaded out of 28 files. 06:52:46.034 [main] INFO org.wikibrain.loader.pipeline.PipelineLoader - Successfully completed stage download 06:52:46.035 [main] INFO org.wikibrain.loader.pipeline.PipelineLoader - Beginning stage dumploader ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console. 06:52:46.659 [main] INFO org.wikibrain.core.cmd.Env - Configured default logging at the Info Level 06:52:46.660 [main] INFO org.wikibrain.core.cmd.Env - To customize log4j2 set the 'log4j.configurationFile' system property or set EnvBuilder.setReconfigureLogging to$ 06:52:49.124 [main] INFO org.wikibrain.conf.Configurator - configurator installed 75 providers for 38 classes 06:52:49.125 [main] INFO org.wikibrain.core.cmd.Env - using baseDir /mnt3/wikibrain/. 06:52:49.125 [main] INFO org.wikibrain.core.cmd.Env - using max vm heapsize of 74581MB 06:52:49.127 [main] INFO org.wikibrain.core.cmd.Env - using languages (EN) 06:52:49.127 [main] INFO org.wikibrain.core.cmd.Env - using maxThreads 16 06:52:49.127 [main] INFO org.wikibrain.core.cmd.Env - using tmpDir ./.tmp 06:52:49.347 [main] WARN org.wikibrain.core.dao.sql.WpDataSource - Raised connections per partition to 3 06:52:49.643 [main] INFO org.wikibrain.loader.DumpLoader - processing file: org.wikibrain.Loader Exception in thread "main" java.lang.IllegalArgumentException: unknown langCode: '' at org.wikibrain.core.lang.Language.getByLangCode(Language.java:102) at org.wikibrain.core.cmd.FileMatcher.getLanguage(FileMatcher.java:210) at org.wikibrain.loader.DumpLoader.load(DumpLoader.java:82) at org.wikibrain.loader.DumpLoader.main(DumpLoader.java:257) 06:56:43.471 [main] ERROR org.parse4j.ParseObject - Request failed. 06:56:43.472 [main] WARN org.wikibrain.loader.pipeline.DiagnosticDao - Save of diagnostics failed: org.json.JSONException: A JSONObject text must begin with '{' at 1 [character 2 line 1] at org.json.JSONTokener.syntaxError(JSONTokener.java:433) ~[wikibrain-withdeps-0.8.0.jar:?] at org.json.JSONObject.(JSONObject.java:194) ~[wikibrain-withdeps-0.8.0.jar:?] at org.json.JSONObject.(JSONObject.java:321) ~[wikibrain-withdeps-0.8.0.jar:?] at org.parse4j.command.ParseResponse.getJsonObject(ParseResponse.java:83) ~[wikibrain-withdeps-0.8.0.jar:?] at org.parse4j.command.ParseResponse.getException(ParseResponse.java:71) ~[wikibrain-withdeps-0.8.0.jar:?] at org.parse4j.ParseObject.save(ParseObject.java:483) ~[wikibrain-withdeps-0.8.0.jar:?] at org.wikibrain.loader.pipeline.DiagnosticDao.save(DiagnosticDao.java:67) ~[wikibrain-withdeps-0.8.0.jar:?] at org.wikibrain.loader.pipeline.DiagnosticDao.saveQuietly(DiagnosticDao.java:72) [wikibrain-withdeps-0.8.0.jar:?] at org.wikibrain.loader.pipeline.PipelineLoader.quietlySaveDiagnostics(PipelineLoader.java:132) [wikibrain-withdeps-0.8.0.jar:?] at org.wikibrain.loader.pipeline.PipelineLoader.run(PipelineLoader.java:113) [wikibrain-withdeps-0.8.0.jar:?] at org.wikibrain.Loader.run(Loader.java:98) [wikibrain-withdeps-0.8.0.jar:?] at org.wikibrain.Loader.main(Loader.java:136) [wikibrain-withdeps-0.8.0.jar:?] 06:56:43.530 [main] ERROR org.parse4j.ParseObject - Request failed.

shilad commented 7 years ago

This is a bad sign. It suggests the download directory format may have changed and is confusing WikiBrain. I will investigate...

On Mon, Mar 6, 2017 at 1:23 AM, Saisubramaniam Gopalakrishnan < notifications@github.com> wrote:

Hi,

While installing Wikibrain (SR only) with the full English language, I encountered the below error. The args during installation: java -Xmx80g -cp wikibrain-withdeps-0.8.0.jar org.wikibrain.Loader org.wikibrain.Loader -l en -s fetchlinks -s download -s dumploader -s redirects -s wikitext -s lucene -s phrases -s sr

The error:

06:52:45.937 [main] INFO org.wikibrain.download.DumpFileDownloader - 28 files downloaded out of 28 files. 06:52:46.034 [main] INFO org.wikibrain.loader.pipeline.PipelineLoader - Successfully completed stage download 06:52:46.035 [main] INFO org.wikibrain.loader.pipeline.PipelineLoader - Beginning stage dumploader ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console. 06:52:46.659 [main] INFO org.wikibrain.core.cmd.Env - Configured default logging at the Info Level 06:52:46.660 [main] INFO org.wikibrain.core.cmd.Env - To customize log4j2 set the 'log4j.configurationFile' system property or set EnvBuilder.setReconfigureLogging to$ 06:52:49.124 [main] INFO org.wikibrain.conf.Configurator - configurator installed 75 providers for 38 classes 06:52:49.125 [main] INFO org.wikibrain.core.cmd.Env - using baseDir /mnt3/wikibrain/. 06:52:49.125 [main] INFO org.wikibrain.core.cmd.Env - using max vm heapsize of 74581MB 06:52:49.127 [main] INFO org.wikibrain.core.cmd.Env - using languages (EN) 06:52:49.127 [main] INFO org.wikibrain.core.cmd.Env - using maxThreads 16 06:52:49.127 [main] INFO org.wikibrain.core.cmd.Env - using tmpDir ./.tmp 06:52:49.347 [main] WARN org.wikibrain.core.dao.sql.WpDataSource - Raised connections per partition to 3 06:52:49.643 [main] INFO org.wikibrain.loader.DumpLoader - processing file: org.wikibrain.Loader Exception in thread "main" java.lang.IllegalArgumentException: unknown langCode: '' at org.wikibrain.core.lang.Language.getByLangCode(Language.java:102) at org.wikibrain.core.cmd.FileMatcher.getLanguage(FileMatcher.java:210) at org.wikibrain.loader.DumpLoader.load(DumpLoader.java:82) at org.wikibrain.loader.DumpLoader.main(DumpLoader.java:257) 06:56:43.471 [main] ERROR org.parse4j.ParseObject - Request failed. 06:56:43.472 [main] WARN org.wikibrain.loader.pipeline.DiagnosticDao - Save of diagnostics failed: org.json.JSONException: A JSONObject text must begin with '{' at 1 [character 2 line 1] at org.json.JSONTokener.syntaxError(JSONTokener.java:433) ~[wikibrain-withdeps-0.8.0.jar:?] at org.json.JSONObject.(JSONObject.java:194) ~[wikibrain-withdeps-0.8.0. jar:?] at org.json.JSONObject.(JSONObject.java:321) ~[wikibrain-withdeps-0.8.0. jar:?] at org.parse4j.command.ParseResponse.getJsonObject(ParseResponse.java:83) ~[wikibrain-withdeps-0.8.0.jar:?] at org.parse4j.command.ParseResponse.getException(ParseResponse.java:71) ~[wikibrain-withdeps-0.8.0.jar:?] at org.parse4j.ParseObject.save(ParseObject.java:483) ~[wikibrain-withdeps-0.8.0.jar:?] at org.wikibrain.loader.pipeline.DiagnosticDao.save(DiagnosticDao.java:67) ~[wikibrain-withdeps-0.8.0.jar:?] at org.wikibrain.loader.pipeline.DiagnosticDao.saveQuietly(DiagnosticDao.java:72) [wikibrain-withdeps-0.8.0.jar:?] at org.wikibrain.loader.pipeline.PipelineLoader.quietlySaveDiagnostics(PipelineLoader.java:132) [wikibrain-withdeps-0.8.0.jar:?] at org.wikibrain.loader.pipeline.PipelineLoader.run(PipelineLoader.java:113) [wikibrain-withdeps-0.8.0.jar:?] at org.wikibrain.Loader.run(Loader.java:98) [wikibrain-withdeps-0.8.0.jar: ?] at org.wikibrain.Loader.main(Loader.java:136) [wikibrain-withdeps-0.8.0.jar:?] 06:56:43.530 [main] ERROR org.parse4j.ParseObject - Request failed.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/shilad/wikibrain/issues/271, or mute the thread https://github.com/notifications/unsubscribe-auth/AAUIoJz1EglZlrO7ovIXIpt0iqOH-rfpks5ri7RxgaJpZM4MTzd0 .

saisubramaniam commented 7 years ago

Thanks @shilad

SindhujaM commented 7 years ago

Kindly can anyone help on resolving the below issue?

17:20:11.799 [main] INFO org.wikibrain.loader.pipeline.PipelineLoader - Successfully completed stage sr 17:20:11.799 [main] INFO org.wikibrain.loader.pipeline.PipelineLoader - Loading successfully finished 17:20:15.126 [main] ERROR org.parse4j.ParseObject - Request failed. 17:20:15.128 [main] WARN org.wikibrain.loader.pipeline.DiagnosticDao - Save of diagnostics failed: org.json.JSONException: A JSONObject text must begin with '{' at 1 [character 2 line 1] at org.json.JSONTokener.syntaxError(JSONTokener.java:433) ~[wikibrain-withdeps-0.8.0.jar:?] at org.json.JSONObject.(JSONObject.java:194) ~[wikibrain-withdeps-0.8.0.jar:?] at org.json.JSONObject.(JSONObject.java:321) ~[wikibrain-withdeps-0.8.0.jar:?] at org.parse4j.command.ParseResponse.getJsonObject(ParseResponse.java:83) ~[wikibrain-withdeps-0.8.0.jar:?] at org.parse4j.command.ParseResponse.getException(ParseResponse.java:71) ~[wikibrain-withdeps-0.8.0.jar:?] at org.parse4j.ParseObject.save(ParseObject.java:483) ~[wikibrain-withdeps-0.8.0.jar:?] at org.wikibrain.loader.pipeline.DiagnosticDao.save(DiagnosticDao.java:67) ~[wikibrain-withdeps-0.8.0.jar:?] at org.wikibrain.loader.pipeline.DiagnosticDao.saveQuietly(DiagnosticDao.java:72) [wikibrain-withdeps-0.8.0.jar:?] at org.wikibrain.loader.pipeline.PipelineLoader.quietlySaveDiagnostics(PipelineLoader.java:132) [wikibrain-withdeps-0.8.0.jar:?] at org.wikibrain.loader.pipeline.PipelineLoader.run(PipelineLoader.java:113) [wikibrain-withdeps-0.8.0.jar:?] at org.wikibrain.Loader.run(Loader.java:98) [wikibrain-withdeps-0.8.0.jar:?] at org.wikibrain.Loader.main(Loader.java:136) [wikibrain-withdeps-0.8.0.jar:?] 17:20:17.132 [main] ERROR org.parse4j.ParseObject - Request failed. 17:20:17.132 [main] WARN org.wikibrain.loader.pipeline.DiagnosticDao - Save of diagnostics failed: org.json.JSONException: A JSONObject text must begin with '{' at 1 [character 2 line 1]

Kindly help me on fixing .

Thanks in @dvance