shilad / wikibrain

The WikiBrain Java library enables researchers and developers to incorporate state-of-the-art Wikipedia-based algorithms and technologies in a few lines of code.
http://shilad.github.io/wikibrain/
Other
91 stars 54 forks source link

configuration step freezed #234

Closed andreapi87 closed 9 years ago

andreapi87 commented 9 years ago

I'm trying installing full english language to evaluate similarity between wikipedia categories. I have an ubuntu 14.04 machine, 12 gb of ram on i5 I started the process from gui of jar file and it has gone well right a few time ago, then log window keep remaining to:

mar 16, 2015 7:34:05 PM org.wikibrain.loader.SqlLinksLoader processOneLink INFORMAZIONI: Processed link 894000000, found 541011312 interesting and 335216479 new mar 16, 2015 7:34:05 PM org.wikibrain.loader.SqlLinksLoader processOneLink INFORMAZIONI: Processed link 894100000, found 541011312 interesting and 335216479 new

I also tried on an another computer (iMac with 12GB) with same results

INFORMAZIONI: Processed link 893900000, found 541011312 interesting and 335216479 new mar 17, 2015 8:40:28 AM org.wikibrain.utils.ParallelForEach$4 run INFORMAZIONI: processing iterable 894000000 mar 17, 2015 8:40:28 AM org.wikibrain.loader.SqlLinksLoader processOneLink INFORMAZIONI: Processed link 894000000, found 541011312 interesting and 335216479 new mar 17, 2015 8:40:28 AM org.wikibrain.loader.SqlLinksLoader processOneLink INFORMAZIONI: Processed link 894100000, found 541011312 interesting and 335216479 new screen

I notice that it freeze at same point (Processed link 894100000) on all the machines. What could it be?

To replicate error I used this configuration java memory: 10 GB language: en data source H2 selected phases: basic data lucene phrases concepts wikidata semantic relatedness

andreapi87 commented 9 years ago

The process goes ok with postgres db instead of H2. however, it reamins at the same point for about 3 hours...with H2 it remains at that point for undefinite time (I tried for about 48 hours without results).