Open goodb opened 4 years ago
FYI, I was surprised to find out that the same model finishes validation (and is valid) with the Refine algorithm in about 4 seconds on my (pretty new) laptop.
From a library user perspective here, this presents a little bit of a quandary. The node-specific recursive algorithm is quite a bit faster then the all-node, all-shape refine algorithm for the majority of the RDF graphs I have tested. But, if it crashes nastily on a few it makes it is a problem - especially in a server context. (I've been experimenting with ways to tell it to quit after a timeout but once it gets going on its way to infinity it is hard to stop...).
Apart from fixing the library (assuming it is in fact broken), it would be helpful to have some guidance about which contexts each of the algorithms would best be applied for your documentation.
This infinite recursion indeed looks like a bug. Can you please try the same validation with the RecursiveValidationWithMemoization algorithm (which by the way appears to be much faster than RecursiveValidation on the examples I've tested) and tell me whether it still loops to infinity.
In any case I will look for the reason for the infinite loop in your example.
Other remarks: The timeout is tested in some parts of the code, so if the infinite loop avoids it's indeed useless. This is also something that could probably be improved.
Regarding the different algorithms:
@iovka thank you. I tested the WithMemorization version of the algorithm and had the same problem with infinity.
I think Refine might be the better general purpose solution for our use case. However, it is tempting to add in some cleverness that would attempt to guess which algorithm to apply based on some analysis of the schema and graph.
Could that be done as a pre-processing step?
That is what I was thinking but not sure if it would be optimal. A system to make a suggestion even if it was something that was manually applied might be useful.
However, it is tempting to add in some cleverness that would attempt to guess which algorithm to apply based on some analysis of the schema and graph.
@goodb, indeed it is a good idea I would say that schema and graph are not enough. It really depends on the validation questions that you are going to ask. With the same graph and same schema, recursive validation might be more efficient if you want to verify only a small number of nodes against some shapes, but refine validation might be preferable if you want to verify a big number of nodes. A "clever" preprocessing is intricate, I will have to think about it.
I'm trying to validate this file: bug.owl.txt
Using the following shape map and schema. (Remove the .txt from everything for typical file extensions).
go-cam-shapes.shapeMap.txt go-cam-shapes.shex.txt
I am iterating through the nodes/shapes identified using the shape map and validating them with: fr.inria.lille.shexjava.validation.RecursiveValidation.validate (node, shapelabel)
It appears to go off to infinity for this node: http://model.geneontology.org/R-HSA-1660499/R-HSA-1675810 and this shape http://purl.obolibrary.org/obo/go/shapes/MolecularFunction
There are other nodes that also fail in a similar way.
I suspect this may be an issue with a looping structure within the RDF. See in particular the path initiated over the property http://purl.obolibrary.org/obo/RO_0002413 emanating from the problem node.
The same approach succeeds on a hundreds of other similarly generated RDF models. So, there is something specific about this one that causes the problem.
Any help would be appreciated. It seems like this might reveal a bug in your validator - but could always be our fault somehow as well.
Thanks...