Closed wojtuch closed 4 years ago
I'm afraid, the RDF storage implementation by RDF4J which we use für local knowledge bases does not scale very well. If you have a large KB, I would recommend trying to store it in Virtuoso (or maybe Fuseki) and accessing it as a remote KB in INCEpTION. However, INCEpTION does not support editing remote KBs (yet) - so you can only access it read-only.
thanks for a rapid answer! we have a virtuoso endpoint, my only concern is that it is password-protected and I guess we'd need to embed the base auth credentials in the endpoint url? e.g. https://user:password@virtuoso-host.example.com would you say it's a safe way to go?
There are many relevant factors affecting the security risk. You have to make a risk assessment yourself depending on your environment.
Anyway. Regarding the original question and the stack traces.
Thanks once again!
I converted the 260MB owl file to a <100MB n3 file and tried uploading it but it failed with a nicely caught error (no stack trace):
2020-02-06 09:52:45 ERROR [admin] ParseErrorLogger - [Rio fatal] Expected an RDF value here, found ''
I had a light bulb moment and after having another look at logs from yesterday, I noticed
2020-02-05 14:00:31 ERROR [admin] ProjectPage - Error: iriString must not be null
which looks at least a little bit similar.
I tried investigating for possible reasons and found that we have pure unicode surface forms which raised suspicions that maybe they start with code points which are not handled correctly (https://github.com/eclipse/rdf4j/blob/master/core/rio/turtle/src/main/java/org/eclipse/rdf4j/rio/turtle/TurtleParser.java#L592).
Removing 2 of them (only so many I could find) caused the upload worker to crash but only after 10 minutes but without having complained about any empty / null values.
I concluded that this might indeed be a size problem and tried setting up a KB via virtuoso connection and everything seems to be working correctly.
Weird thing is that I'm able to parse the triples using Rio locally without any problems but this might be some strange cross-system encoding issues.
The iriString must not be null
should be unrelated to the import. See the stack trace:
java.lang.NullPointerException: iriString must not be null
at java.util.Objects.requireNonNull(Objects.java:228) ~[?:1.8.0_222]
at org.eclipse.rdf4j.model.impl.SimpleIRI.setIRIString(SimpleIRI.java:71) ~[rdf4j-model-2.5.1-inception-1.jar!/:2.5.1-inception-1+47c031e]
at org.eclipse.rdf4j.model.impl.SimpleIRI.<init>(SimpleIRI.java:63) ~[rdf4j-model-2.5.1-inception-1.jar!/:2.5.1-inception-1+47c031e]
at org.eclipse.rdf4j.model.impl.AbstractValueFactory.createIRI(AbstractValueFactory.java:86) ~[rdf4j-model-2.5.1-inception-1.jar!/:2.5.1-inception-1+47c031e]
at de.tudarmstadt.ukp.inception.ui.kb.project.RootConceptsPanel.actionNewRootConcept(RootConceptsPanel.java:110) ~[inception-ui-kb-0.14.2.jar!/:?]
...
The error was generated by actionNewRootConcept
which means a user in the UI tried to add an empty IRI as one of the KB's root concepts.
So the RIO error should be unrelated to this iriString must not be null
- at least considering the stack trace.
Maybe try another format? Turtle?
Strangely enough we didn't try to do anything via UI, just started the import process, so I assumed it could be connected.
In the end it seems to have had to do with the encoding. Looks like after removing the unicode-only literals the upload finished successfully (much later as you noted)
Ok, then I guess we can close this issue.
We've had inception v. 0.9.1 deployed for some time and been using it happily. However, our knowledge base grew enormously (over 3-4 times in size, up to ~270MB owl file) and our data team wanted to start using the new one. Upload didn't work but we decided to upgrade Inception to the latest version before trying to troubleshoot what was going on. I used the recent version from docker hub (0.14.2) but after the file has been uploaded through the UI, the following errors populate in the server logs:
I found the 100MB spring boot upload limit so I tried to do a bit of hacking myself, increased it to 500MB and rebuild the docker image. It didn't help, I could provide the logs if you need but they look very alike.
Thanks in advance!