shilad / wikibrain

The WikiBrain Java library enables researchers and developers to incorporate state-of-the-art Wikipedia-based algorithms and technologies in a few lines of code.
http://shilad.github.io/wikibrain/
Other
91 stars 54 forks source link

Stages not triggered when importing to Postgres #254

Open cheetah90 opened 8 years ago

cheetah90 commented 8 years ago

All other stages in org.wikibrain.download.loader are not triggered when selecting Postgres as the data source in the GUILoader. (Not sure if this is solely related to GUILoader).

cheetah90 commented 8 years ago

No worries. I figured it out. Looks like if the Postgres' tables such as "local_articles" etc., cannot distinguish between different language editions so once the tables are created for one language, those stages are skipped when importing other languages? Did I miss something or is it really a issue?

cheetah90 commented 8 years ago

Still waiting for the answer on this.... I realize that after I imported one language into the Postgres, importing the second language edition will halt since it thinks that the language has already been parsed.

bjhecht commented 8 years ago

We don't need spatial for this project, so we can probably skip this.

The intended use though is that you import them all at the same time.

Sent from Mobile Device (please excuse brevity and any autocorrect-induced errors)

On Dec 8, 2015, at 16:45, Allen Lin notifications@github.com wrote:

Still waiting for the answer on this.... I realize that after I imported one language into the Postgres, importing the second language edition will halt since it thinks that the language has already been parsed.

— Reply to this email directly or view it on GitHub.

cheetah90 commented 8 years ago

Oh okay. I just want to leverage the speed of Postgres to handle the non-spatial data. H2 is slow on dataset over 1M entires. I'll try the H2 then.

shilad commented 8 years ago

Sorry for the delayed response! The only way to add languages right now is to reinstall from scratch with all the languages you want. This is the number one feature on our summer to-do list, but it isn't easy to do efficiently with the current setup. I would recommend that you stick to postgres. It will definitely be faster for es.

bjhecht commented 8 years ago

Ah, read the e-mail too quickly. Using postgres makes sense.

The key thing here is that they have to be imported at the same time, I believe.

On 12/8/2015 5:14 PM, Allen Lin wrote:

Oh okay. I just want to leverage the speed of Postgres to handle the non-spatial data. H2 is slow on dataset over 1M entires. I'll try the H2 then.

— Reply to this email directly or view it on GitHub https://github.com/shilad/wikibrain/issues/254#issuecomment-163052159.

cheetah90 commented 8 years ago

@shilad I currently manually run the stages to import data of different language editions. This works for me as I successfully imported EN, ES into the database. However, if I use the Loader class, it seems that some tables get dropped.