Open andrawaag opened 3 years ago
@hsolbrig @ericprud @labra If I am not mistaken you have implemented various output detailing possible errors in a ShEx validation. Are there a finite number of possible errors? Would you mind listing the type of errors listed in your solutions?
Hi @andrawaag when faced with a similar situation for the gene ontology folks, I opened up the java shex so I could access failure information directly, rather than parsing output files. You might have more luck that way as the information you need is by definition in there somewhere.
For that project, I returned 1) the node(s) that failed, 2) the properties that disagreed with the schema applied. I think this was enough to make a start on human readable error reporting, but maybe touch base with team GO to see how that is going now. The hard part we didn't really work on yet was the error cascade - e.g. when one node fails and then that failure causes another node to fail etc.. In those models, which are pretty small, human understanding can usually find the root of a problem but this will be an issue over larger models.
Work was related to this issue: https://github.com/geneontology/minerva/issues/212
I am applying ShEx validation on a large scale on Wikidata items, but I am struggling to aggregate the results in a sensible report. I am looking for best practices here. This is the approach I have followed so far, which is working for this specific use case.
Take for example the following use case:
https://w.wiki/387j
.I have developed a script using WikidataIntegrator and PyShEx to do the validation. These are the steps:
The current aggregated report is sufficient for its task, i.e. where are the issues. But getting there requires some suboptimal parsing of output of strings and some arbitrary clustering on types of errors.
{'No matching triples found for predicate p:P2888': 6608, 'No matching triples found for predicate ps:P279': 6217, '2 triples exceeds max {1,1}': 3666, 'No matching triples found for predicate prov:wasDerivedFrom': 2632, '{"values": ["http://www.wikidata.org/entity/Q5282129"], "typ...': 534, '{"values": ["http://www.wikidata.org/entity/Q27468140"], "ty...': 1304, 'No matching triples found for predicate pr:P699': 772, '3 triples exceeds max {1,1}': 9, 'No matching triples found for predicate pr:P5270': 1}
I am looking for: