Open alexskr opened 1 year ago
we are seeing a similar error for the following API calls:
/ontologies/M4M-21-VARIABLES/classes/http%3A%2F%2Fpurl.org%2Fm4m21%2Fvariables%2F1006/mappings
Stack Trace:
SPARQL::Client::MalformedQuery: MALFORMED QUERY: Line 5, Found '<'. Was expecting one of: BIND, BLANK_NODE_LABEL, DECIMAL, DOUBLE, FALSE, FILTER, GRAPH, INTEGER, MINUS, NIL-SYMBOL, OPTIONAL, Q_IRI_REF, QNAME, QNAME_NS, SELECT, SERVICE, STRING_LITERAL1, STRING_LITERAL2, STRING_LITERAL_LONG1, STRING_LITERAL_LONG2, TEXTINDEX, TRUE, VALUES, VARNAME or punctuation '(', '+', '-', '<<', '[', '[]', '{', '}'.
…ases/20220811020542/controllers/mappings_controller.rb: 13:in `block in <class
<truncated 73 additional frames>
/srv/ncbo/ontologies_api/shared/bundle/ruby/2.7.0/bin/unicorn:23:in `load'
/srv/ncbo/ontologies_api/shared/bundle/ruby/2.7.0/bin/unicorn:23:in `<top (required)>'
/usr/local/rbenv/versions/2.7.6/bin/bundle:23:in `load'
/usr/local/rbenv/versions/2.7.6/bin/bundle:23:in `<main>'
/ontologies/DDIEM/classes/http%3A%2F%2Fgroups.google.com%2Fgroup%2Fogms-discuss%2Fbrowse_thread%2Fthread%2Fca0ad373f27774c5%0A%0AOGMS%20call%20adoption-%2016%20SEPT%202015%0Ahttps%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1iiV1-fTS7BUUSzDw3N_Afx42698YWf54-
Stack Trace:
SPARQL::Client::MalformedQuery: MALFORMED QUERY: Line 1, Found '<'. Was expecting one of: ABS, AVG, BNODE, BOUND, CEIL, COALESCE, CONCAT, CONTAINS, COUNT, DATATYPE, DAY, DECIMAL, DOUBLE, ENCODE_FOR_URI, EXISTS, FALSE, FLOOR, GROUP_CONCAT, HOURS, IF, INTEGER, IRI, ISBLANK, ISIRI, ISLITERAL, ISNUMERIC, ISTRIPLE, ISURI, LANG, LANGMATCHES, LCASE, MAX, MD5, MIN, MINUTES, MONTH, NOT, NOW, NUMERIC-PLUS, Q_IRI_REF, QNAME, QNAME_NS, RAND, REGEX, REPLACE, ROUND, SAMETERM, SAMPLE, SECONDS, SHA1, SHA256, SHA384, SHA512, STR, STRAFTER, STRBEFORE, STRDT, STRENDS, STRING_LITERAL1, STRING_LITERAL2, STRING_LITERAL_LONG1, STRING_LITERAL_LONG2, STRLANG, STRLEN, STRSTARTS, STRUUID, SUBSTR, SUM, TIMEZONE, TRUE, TZ, UCASE, URI, UUID, VARNAME, YEAR or punctuation '!', '(', '+', '-', '<<'.
…_api/releases/20220811020542/helpers/classes_helper.rb: 59:in `get_class'
…eases/20220811020542/controllers/classes_controller.rb: 78:in `block(2 levels) in <class
<truncated 80 additional frames>
/srv/ncbo/ontologies_api/shared/bundle/ruby/2.7.0/bin/unicorn:23:in `load'
/srv/ncbo/ontologies_api/shared/bundle/ruby/2.7.0/bin/unicorn:23:in `<top (required)>'
/usr/local/rbenv/versions/2.7.6/bin/bundle:23:in `load'
/usr/local/rbenv/versions/2.7.6/bin/bundle:23:in `<main>'
3./ontologies/NCOD/properties
Stack Trace:
SPARQL::Client::MalformedQuery: MALFORMED QUERY: Line 3, Found '<'. Was expecting one of: BLANK_NODE_LABEL, DECIMAL, DOUBLE, FALSE, INTEGER, NIL-SYMBOL, PATH-PLUS, Q_IRI_REF, QNAME, QNAME_NS, STRING_LITERAL1, STRING_LITERAL2, STRING_LITERAL_LONG1, STRING_LITERAL_LONG2, TRUE, VARNAME or punctuation '(', ')', '*', '+', '-', '/', '<<', '?', '[', '[]', '{', '|'.
…es/20220811020542/controllers/properties_controller.rb: 10:in `block(2 levels) in <class
<truncated 77 additional frames>
/srv/ncbo/ontologies_api/shared/bundle/ruby/2.7.0/bin/unicorn:23:in `load'
/srv/ncbo/ontologies_api/shared/bundle/ruby/2.7.0/bin/unicorn:23:in `<top (required)>'
/usr/local/rbenv/versions/2.7.6/bin/bundle:23:in `load'
/usr/local/rbenv/versions/2.7.6/bin/bundle:23:in `<main>'
I am investigating this issue. The following query, indeed, appears to be incorrectly composed:
SELECT DISTINCT ?s2 ?g ?source ?o
WHERE {
{
GRAPH <http://data.bioontology.org/ontologies/M4M-21-VARIABLES/submissions/5> {
<http://purl.org/m4m21/?s2 ?g ?source ?o/1006> <http://bioportal.bioontology.org/ontologies/umls/cui> ?o .
}
GRAPH ?g {
?s2 <http://bioportal.bioontology.org/ontologies/umls/cui> ?o .
}
BIND ('CUI' AS ?source)
}
UNION
{
GRAPH <http://data.bioontology.org/ontologies/M4M-21-VARIABLES/submissions/5> {
<http://purl.org/m4m21/?s2 ?g ?source ?o/1006> <http://data.bioontology.org/metadata/def/mappingSameURI> ?o .
}
GRAPH ?g {
?s2 <http://data.bioontology.org/metadata/def/mappingSameURI> ?o .
}
BIND ('SAME_URI' AS ?source)
}
UNION
{
GRAPH <http://data.bioontology.org/ontologies/M4M-21-VARIABLES/submissions/5> {
<http://purl.org/m4m21/?s2 ?g ?source ?o/1006> <http://data.bioontology.org/metadata/def/mappingLoom> ?o .
}
GRAPH ?g {
?s2 <http://data.bioontology.org/metadata/def/mappingLoom> ?o .
}
BIND ('LOOM' AS ?source)
}
UNION
{
GRAPH <http://data.bioontology.org/ontologies/M4M-21-VARIABLES/submissions/5> {
<http://purl.org/m4m21/?s2 ?g ?source ?o/1006> <http://data.bioontology.org/metadata/def/mappingRest> ?o .
}
GRAPH ?g {
?s2 <http://data.bioontology.org/metadata/def/mappingRest> ?o .
}
BIND ('REST' AS ?source)
}
FILTER (!STRSTARTS(str(?g),'http://data.bioontology.org/ontologies/M4M-21-VARIABLES'))
}
You can see that the lines <http://purl.org/m4m21/?s2 ?g ?source ?o/1006> <http://data.bioontology.org/metadata/def/mappingRest> ?o .
contain predicates inside the <>
brackets, which should not be the case.
cc @syphax-bouazzouni as we were looking at this query recently to "improve" mappings gathering in AgroPortal.
The failing query for /ontologies/NCOD/properties
:
SELECT ?c WHERE {
GRAPH <http://data.bioontology.org/ontologies/NCOD/submissions/1> {
?c <http://www.w3.org/2000/01/rdf-schema#subPropertyOf> <http://www.geneontology.org/formats/oboInOwl\#@prefix dcat> .
}
}
LIMIT 1
This error appears to be thrown by AllegroGraph much more frequently than by 4store. The main cause is the presence of illegal characters in the ClassID.
I was able to identify a number of places in our code, where replacing the characters such as " ", "<" or ">" in the ClassID with their URL-encoded counterparts addresses the issue. But, there are multiple other cases, where the constructed query is malformed due to the special characters present in the ClassID. For example, ELD parsing fails due to this query:
SELECT DISTINCT ?id ?prefLabel ?synonym ?label
FROM <http://data.bioontology.org/ontologies/ELD/submissions/5>
WHERE {
?id a <http://www.w3.org/2004/02/skos/core#Concept> .
OPTIONAL {
?id ?rewrite0 ?prefLabel .
FILTER(?rewrite0 = <http://data.bioontology.org/metadata/def/prefLabel> || ?rewrite0 = <http://www.w3.org/2004/02/skos/core#prefLabel>)
}
OPTIONAL {
?id <http://www.w3.org/2004/02/skos/core#altLabel> ?synonym .
}
OPTIONAL {
?id <http://www.w3.org/2000/01/rdf-schema#label> ?label .
}
FILTER(?id = <https://github.com/VODANA/Controlled-vocabularyseverepneumonia(Otherpneumonia,organismunspecified)> ||
?id = <https://github.com/VODANA/Controlled-vocabularycordprolapse(Labouranddeliverycomplicatedbyprolapseofcord)> ||
?id = <https://github.com/VODANA/Controlled-vocabularycommoncold(Acutenasopharyngitis[commoncold])>)
}
This query fails because of characters "[" and "]" present in the last ID, which are reserved SPARQL characters.
This endpoint call fails due to a space inside the last ClassID:
/ontologies/NCOD/properties
Here is the resulting SPARQL query:
SELECT ?c WHERE {
GRAPH <http://data.bioontology.org/ontologies/NCOD/submissions/1> {
?c <http://www.w3.org/2000/01/rdf-schema#subPropertyOf>
<http://www.geneontology.org/formats/oboInOwl#@prefix dcat> .
}
}
LIMIT 1
This is the query that fails during MCCL ontology parsing due to the "[" and "]" characters present in an ID inside the last FILTER clause:
SELECT DISTINCT ?id ?prefLabel ?synonym ?label
FROM <http://data.bioontology.org/ontologies/MCCL/submissions/2>
WHERE {
?id a <http://www.w3.org/2002/07/owl#Class> .
OPTIONAL {
?id ?rewrite0 ?prefLabel .
FILTER(?rewrite0 = <http://data.bioontology.org/metadata/def/prefLabel> || ?rewrite0 = <http://www.w3.org/2004/02/skos/core#prefLabel>)
} OPTIONAL {
?id ?rewrite1 ?synonym .
FILTER(?rewrite1 = <http://www.geneontology.org/formats/oboInOwl#hasRelatedSynonym> || ?rewrite1 = <http://www.geneontology.org/formats/oboInOwl#hasNarrowSynonym> || ?rewrite1 = <http://www.geneontology.org/formats/oboInOwl#hasBroadSynonym> || ?rewrite1 = <http://purl.obolibrary.org/obo/synonym> || ?rewrite1 = <http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> || ?rewrite1 = <http://www.w3.org/2004/02/skos/core#altLabel>)
} OPTIONAL {
?id <http://www.w3.org/2000/01/rdf-schema#label> ?label .
}
FILTER(?id = <http://www.w3.org/2002/07/owl#Thing> || ?id = <http://www.semanticweb.org/pallabi.d/ontologies/2014/2/untitled-ontology-11#ZNRF4-Arg149*>
|| ?id = <http://www.semanticweb.org/pallabi.d/ontologies/2014/2/untitled-ontology-11#KCNH8-Glu143*>
|| ?id = <http://www.semanticweb.org/pallabi.d/ontologies/2014/2/untitled-ontology-11#KB-CH[R]-8-5Cell>)
}
The exact list of affected ontologies: OPTUM, AUTISM, ELD, MCCL, ETH_ANC, ANC, HOOM.
GAZ and DRON also report MalformedQuery errors but the error is different:
SPARQL::Client::MalformedQuery: QUERY FAILED: Not CaaT state: nil within set #<db.agraph.sbqe::bindings-set 1[3] ?id 0(0) solutions @ #x100ce92bcf2>
I was able to identify the SPARQL query that causes the error: QUERY FAILED: Not CaaT state: nil...
SELECT DISTINCT ?id FROM <http://data.bioontology.org/ontologies/DRON/submissions/14>
WHERE { ?id a <http://www.w3.org/2002/07/owl#Class> . } OFFSET 500000 LIMIT 2500
I experimented with the offsets and found the following pattern:
Executes fine:
SELECT DISTINCT ?id FROM <http://data.bioontology.org/ontologies/DRON/submissions/14>
WHERE { ?id a <http://www.w3.org/2002/07/owl#Class> . } OFFSET 499000 LIMIT 2500
Fails:
SELECT DISTINCT ?id FROM <http://data.bioontology.org/ontologies/DRON/submissions/14>
WHERE { ?id a <http://www.w3.org/2002/07/owl#Class> . } OFFSET 499001 LIMIT 2500
Executes fine:
SELECT DISTINCT ?id FROM <http://data.bioontology.org/ontologies/DRON/submissions/14>
WHERE { ?id a <http://www.w3.org/2002/07/owl#Class> . } OFFSET 499001 LIMIT 999
Fails:
SELECT DISTINCT ?id FROM <http://data.bioontology.org/ontologies/DRON/submissions/14>
WHERE { ?id a <http://www.w3.org/2002/07/owl#Class> . } OFFSET 499001 LIMIT 1000
Running a COUNT query on the graph yields these results:
SELECT (COUNT(*) as ?Triples)
WHERE { GRAPH <http://data.bioontology.org/ontologies/DRON/submissions/14> { ?s ?p ?o } }
Triples
"4870128"
and distinct:
SELECT (COUNT(DISTINCT *) as ?Triples)
WHERE { GRAPH <http://data.bioontology.org/ontologies/DRON/submissions/14> { ?s ?p ?o } }
Triples
"4848151"
It looks like the graph contains close to 5 million triples, so the offset of 500K technically should work fine, but it doesn’t.
The issue below has now been resolved by deploying a patch from AllegroGraph (bug26872-v7.3.0.fasl.patch). This patch is rolled into the future versions of AllegroGraph, so it won't need to be maintained beyond this version.
SPARQL::Client::MalformedQuery: QUERY FAILED: Not CaaT state: nil within set #<db.agraph.sbqe::bindings-set 1[3] ?id 0(0) solutions @ #x100ce92bcf2>
Parsing fails with AllegroGraph backend for HOOM, ELD, MCCL, ANC and other private ontologies with the following error: