Open goodb opened 4 years ago
Ok, so I manage to reproduce the issues and found where the problem is. It is not in the parser but in the computation of the stratification (Line 703 in ShexSchema.java). There is an enumeration of path and it can be slow and costly when you have loops. This is a part of the code that I don't really like but I don't have a good solution to change it.
I will try to find a solution.
I have release a new version (1.2.3c) where I changed the bound to 10. It is not perfect, since it does not fully check that the schema is stratified, but it should limit the cost and allow you to parse your schema.
After the holiday I will look for a better solution.
Thank you! It looks like it is working correctly and quickly even with the full schema now.
I anticipate the schema will continue to evolve over the coming months and we will be testing it with thousands of different RDF files. Perhaps this will produce some good additional test cases for your work. FYI we will also be doing some performance testing across the different libraries - shaclex in particular.
Ok. We are interested by the performance result.
Informally, the test result is that the java implementation is substantially faster than the scala and python versions. Sorry we haven't done a good scientific comparison for this, but anecdotally it was enough to convince us to use the java for our production server for now.
(FYI the demonstrator service at http://shexjava.lille.inria.fr seems to be down).
The following schema causes a problem for the GenParser.parseSchema method.
It is strange. If you take out either of the empty shapes (s1, s2) or if you take out one of the constraints on the MolecularFunction shape, it parses almost instantly. As it is, I have left it running for more than 5 minutes and watched the java memory usage spike up above 4GB.
I have extracted this minimal example from the real schema our group is working on here: https://github.com/geneontology/go-shapes/blob/master/shapes/go-cam-shapes.shex
Help on this would be awesome.