shilad / wikibrain

The WikiBrain Java library enables researchers and developers to incorporate state-of-the-art Wikipedia-based algorithms and technologies in a few lines of code.
http://shilad.github.io/wikibrain/
Other
91 stars 54 forks source link

Is what I want possible? WikiBrain full english DB imported to Amazon RDS #270

Open Johnsel opened 7 years ago

Johnsel commented 7 years ago

Hello everyone, first off all my compliments. It is a great library that you have built and that you have open-sourced it commands my gratitude and respect :)

I have a question/issue. Let me describe the situation first:

  1. We want to make use of the WikiBrain library to do semantic queries.
  2. We have our infrastructure running on the Amazon cloud compute platform.
  3. We have used a high spec server to import the wiki data to an Amazon RDS hosted postgresql server
  4. Now that we have deployed a lower spec machine, we seem to be unable to query said server, as we receive errors when it is trying to build the local indexes saying that it is not able to because of missing files. It comes immediately after:

MatrixLocalLinkDao empty, but delegate is not. Attempting to rebuild...

(I sadly forgot to save the document which contains the specific error messages, if you need them I can redeploy a machine)

  1. I also tried running the data import process again, but when targetting the rds postgresql I got the error message: no class found for shortname localpage

Even before anything else happened

I have since deployed a different server to do an import to a local PostgreSQL server, but I am wondering if that is the only solution that will work or if there is a way to use the existing server with a "fresh library install".

Thank you for your time and any insights you may be able to provide!

Best regards,

John Simons

shilad commented 7 years ago

Wow! I don't know, but I presume it's possible. The indexing process can take a very long time on a local PSQL server, so I imagine it may take even longer on RDS? DO you have any ideas about what the hardware behind your RDS installation is?

Johnsel commented 7 years ago

First of all thank you for your response @shilad. Yes I have an idea about the hardware behind it. It is a "db.t2.micro" instance which corresponds to a 1ECU/1vCPU/1GB RAM instance. We can scale this up though, so I don't foresee that being our limiting factor (just yet).

My main concern is those errors when trying to run a sample pointed at said server. I know I was able to go through the importing process using the GUI tool successfully, but I am unsure how to now pick this back up on another machine. I can't even get it to restart the indexing process.

The local instance unfortunately flunked out as well with several null pointer exceptions and errors regarding data type mismatches/not fitting (Ubuntu 16.04 LTS + Java 8 JDK). I am concerned that we will have to let this idea fare entirely, which would be disappointing to say the least.

Thanks again for your time and effort.

Best regards,

John

Johnsel commented 7 years ago

I just tried again and now it seems to go past that moment. I now used openjdk-8 instead of the official Oracle installer to get a java jdk install. It is currently at the "merging all sorted files to ..." stage. I will keep the thread updated in case anyone else runs into similar issues or wants to run WikiBrain from a RDS server as well.

Johnsel commented 7 years ago

It just crashed again :(

I have the full error now though: Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.5.0:java (default-cli) on project semanticquery: An exception occured while executing the Java class. null: InvocationTargetException: no index at location: ./db/lucene/en -> [Help 1]

shilad commented 7 years ago

Hi Johnsel. I'm sorry to hear this. Now I understand your question. Sadly, right now Wikibrain isn't designed to "pick up" an old instance against an existing SQL database on a new local filesystem. We've talked about ways to fix this, but it's a major change.