Open loicjaouen opened 2 years ago
gravsearch request is:
PREFIX kb: <http://api.knora.org/ontology/knora-api/v2#>
PREFIX stardom: <http://0.0.0.0:3333/ontology/0107/stardom/v2#>
CONSTRUCT {
?page kb:isMainResource true .
?page stardom:isPartOfDocumentPiece ?piece .
?piece stardom:isRelatedToMovie ?propVallinkedRes000 .
?page stardom:hasThematicKeyword ?keyword .
} WHERE {
?page a kb:Resource .
?page a stardom:Page .
?page stardom:hasThematicKeyword ?keyword .
?keyword kb:listValueAsListNode <http://rdfh.ch/lists/0107/stardom-list-thematicKeywords-character> .
?page stardom:isPartOfDocumentPiece ?piece .
?piece stardom:isRelatedToMovie ?propVallinkedRes000 .
?piece a stardom:DocumentPiece .
}
our repo is rather big:
graph name | triples |
---|---|
http://www.knora.org/data/0105/drawings-gods | 3583561 |
http://www.knora.org/data/0101/parole-religieuse | 1397322 |
http://www.knora.org/data/0107/stardom | 1146732 |
http://www.knora.org/data/0112/roud-oeuvres | 791442 |
http://www.knora.org/data/0103/theatre-societe | 676432 |
http://www.knora.org/data/0114/elites-cio | 501129 |
The request is pushing the cpu usage but the ram is rather limited.
Pushing the default -Xmx 3G
to 5G doesn't improve perfs significantly.
Trying -Xms
to 5G doesn't do much either.
There is no time-out on fuseki, so the request run forever (impacting further requests)
the generated sparql for fuseki, reformatted, is:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX kb: <http://www.knora.org/ontology/knora-base#>
PREFIX stardom: <http://www.knora.org/ontology/0107/stardom#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xmls: <http://www.w3.org/2001/XMLSchema#>
SELECT DISTINCT
?page
(GROUP_CONCAT(DISTINCT(IF(BOUND(?piece), STR(?piece), "")); SEPARATOR='') AS ?piece__Concat)
(GROUP_CONCAT(DISTINCT(IF(BOUND(?keyword), STR(?keyword), "")); SEPARATOR='') AS ?keyword__Concat)
(GROUP_CONCAT(DISTINCT(IF(BOUND(?isPartOfPieceValueLinkValue), STR(?isPartOfPieceValueLinkValue), "")); SEPARATOR='') AS ?isPartOfPieceValueLinkValue__Concat)
(GROUP_CONCAT(DISTINCT(IF(BOUND(?movie), STR(?movie), "")); SEPARATOR='') AS ?movie__Concat)
WHERE {
?relatedToMovie rdfs:subPropertyOf* stardom:isRelatedToMovie .
?piece ?relatedToMovie ?movie .
?hasListNode rdfs:subPropertyOf* kb:valueHasListNode .
?keyword ?hasListNode ?listNodeVar .
<http://rdfh.ch/lists/0107/stardom-list-thematicKeywords-character> kb:hasSubListNode* ?listNodeVar .
?isPartOfPiece rdfs:subPropertyOf* stardom:isPartOfDocumentPiece .
?page ?isPartOfPiece ?piece .
?isPartOfPieceValue rdfs:subPropertyOf* stardom:isPartOfDocumentPieceValue .
?page ?isPartOfPieceValue ?isPartOfPieceValueLinkValue .
?isPartOfPieceValueLinkValue rdf:type kb:LinkValue .
?isPartOfPieceValueLinkValue rdf:object ?piece .
?hasKeyword rdfs:subPropertyOf* stardom:hasThematicKeyword .
?page ?hasKeyword ?keyword .
FILTER NOT EXISTS { ?piece kb:isDeleted "true"^^xmls:boolean . }
FILTER NOT EXISTS { ?movie kb:isDeleted "true"^^xmls:boolean . }
FILTER NOT EXISTS { ?page kb:isDeleted "true"^^xmls:boolean . }
FILTER NOT EXISTS { ?isPartOfPieceValueLinkValue kb:isDeleted "true"^^xmls:boolean . }
FILTER NOT EXISTS { ?keyword kb:isDeleted "true"^^xmls:boolean . }
}
GROUP BY ?page
ORDER BY ASC(?page)
LIMIT 25
from:
worked out the config (to be used with make init-db-test-empty
)
@prefix : <http://base/#> .
@prefix fuseki: <http://jena.apache.org/fuseki#> .
@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix tdb2: <http://jena.apache.org/2016/tdb#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix text: <http://jena.apache.org/text#> .
@prefix knora-base: <http://www.knora.org/ontology/knora-base#> .
tdb2:DatasetTDB rdfs:subClassOf ja:RDFDataset .
ja:DatasetTxnMem rdfs:subClassOf ja:RDFDataset .
<http://jena.hpl.hp.com/2008/tdb#DatasetTDB> rdfs:subClassOf ja:RDFDataset .
<http://jena.hpl.hp.com/2008/tdb#GraphTDB> rdfs:subClassOf ja:Model .
tdb2:GraphTDB2 rdfs:subClassOf ja:Model .
ja:MemoryDataset rdfs:subClassOf ja:RDFDataset .
ja:RDFDatasetZero rdfs:subClassOf ja:RDFDataset .
<http://jena.apache.org/text#TextDataset> rdfs:subClassOf ja:RDFDataset .
tdb2:GraphTDB rdfs:subClassOf ja:Model .
ja:RDFDatasetOne rdfs:subClassOf ja:RDFDataset .
ja:RDFDatasetSink rdfs:subClassOf ja:RDFDataset .
tdb2:DatasetTDB2 rdfs:subClassOf ja:RDFDataset .
[] rdf:type fuseki:Server ;
fuseki:services :service_tdb_all ;
ja:loadClass "org.apache.jena.query.text.TextQuery" .
:service_tdb_all a fuseki:Service ;
rdfs:label "TDB2 @REPOSITORY@" ;
fuseki:dataset :text_dataset ;
fuseki:name "@REPOSITORY@" ;
fuseki:serviceQuery "query" , "" , "sparql" ;
fuseki:serviceReadGraphStore "get" ;
fuseki:serviceReadQuads "" ;
fuseki:serviceReadWriteGraphStore "data" ;
fuseki:serviceReadWriteQuads "" ;
fuseki:serviceUpdate "" , "update" ;
fuseki:serviceUpload "upload" .
:tdb_dataset_readwrite
a tdb2:DatasetTDB2 ;
# adding this back
tdb2:unionDefaultGraph true ;
tdb2:location "/fuseki/databases/@REPOSITORY@" .
:dataset a ja:RDFDataset ;
ja:defaultGraph :model_inf .
:model_inf a ja:InfModel ;
ja:baseModel :graph ;
ja:reasoner [
#ja:reasonerURL <http://jena.hpl.hp.com/2003/OWLFBRuleReasoner>
ja:reasonerURL <http://jena.hpl.hp.com/2003/TransitiveReasoner>
] .
:graph rdf:type tdb2:GraphTDB ;
tdb2:dataset :tdb_dataset_readwrite .
:text_dataset rdf:type text:TextDataset ;
text:dataset :dataset ;
text:index :indexLucene .
:indexLucene a text:TextIndexLucene ;
text:directory <file:/fuseki/lucene/@REPOSITORY@> ;
text:entityMap :entMap ;
# below are added lines, to be removed?
text:storeValues true ;
text:analyzer [ a text:StandardAnalyzer ] ;
text:queryAnalyzer [ a text:StandardAnalyzer ] ;
text:queryParser text:AnalyzingQueryParser .
# Mapping in the index
# URI stored in field "uri"
# knora-base:valueHasString is mapped to field "text"
:entMap a text:EntityMap ;
text:entityField "uri" ;
text:defaultField "text" ;
text:uidField "uid" ;
#text:langField "lang" ;
#text:graphField "graph" ;
text:map (
[ text:field "text" ; text:predicate rdfs:label ]
[ text:field "text" ; text:predicate knora-base:valueHasString ]
[ text:field "text" ; text:predicate knora-base:valueHasComment ]
) .
unfortunately, the basic request do not work (GET http://0.0.0.0:3333/admin/projects
)
because despite the:
tdb2:unionDefaultGraph true ;
one has to add the graph, but the default requests don't do so.
request http://0.0.0.0:3333/admin/projects generates the sparql, which work only with the added FROM
clause:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX knora-admin: <http://www.knora.org/ontology/knora-admin#>
CONSTRUCT { ?s ?p ?o . }
FROM <http://www.knora.org/data/admin>
WHERE {
?s rdf:type knora-admin:knoraProject .
?s ?p ?o .
}
A bit of effort in this direction could solve this.
and the previous never ending request, with the addition of a:
from <http://www.knora.org/data/0107/stardom>
returns... but in 800 seconds (15 minutes)
without the inference (default config) it finally returns after 2'602 seconds (45 minutes) so it improves significantly the speed, but maybe still not enough (and there are still some details to sort out)
going on without the inference on
to test if the repo size matters
Does the size of the graph matter?
cutting the data, from (in number of triples):
default graph | 8'512'051 |
---|---|
http://www.knora.org/data/0105/drawings-gods | 3'583'561 |
http://www.knora.org/data/0101/parole-religieuse | 1'397'322 |
to:
default graph | 3'530'560 |
---|---|
http://www.knora.org/data/0107/stardom | 1'146'732 |
http://www.knora.org/data/0112/roud-oeuvres | 791'442 |
=> yes, the request completes in 784 seconds (13 minutes)
cutting down to: | default graph | 1561557 |
---|---|---|
http://www.knora.org/data/0107/stardom | 1146732 | |
http://www.knora.org/data/0116/medframes | 367742 |
the same request takes: 497 s (8 minutes)
for the record, the same request on our prod does not finish because we have a timeout at 30s
=> so the requests are completely different between graphdb et fuseki
we know that property paths are quite costly, simplifying the request:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX kb: <http://www.knora.org/ontology/knora-base#>
PREFIX stardom: <http://www.knora.org/ontology/0107/stardom#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xmls: <http://www.w3.org/2001/XMLSchema#>
SELECT DISTINCT
?page
(GROUP_CONCAT(DISTINCT(IF(BOUND(?piece), STR(?piece), "")); SEPARATOR='') AS ?piece__Concat)
(GROUP_CONCAT(DISTINCT(IF(BOUND(?keyword), STR(?keyword), "")); SEPARATOR='') AS ?keyword__Concat)
(GROUP_CONCAT(DISTINCT(IF(BOUND(?isPartOfPieceValueLinkValue), STR(?isPartOfPieceValueLinkValue), "")); SEPARATOR='') AS ?isPartOfPieceValueLinkValue__Concat)
(GROUP_CONCAT(DISTINCT(IF(BOUND(?movie), STR(?movie), "")); SEPARATOR='') AS ?movie__Concat)
# from <http://www.knora.org/data/0107/stardom>
WHERE {
#?relatedToMovie rdfs:subPropertyOf* stardom:isRelatedToMovie .
?piece stardom:isRelatedToMovie ?movie .
#?hasListNode rdfs:subPropertyOf* kb:valueHasListNode .
?keyword kb:valueHasListNode ?listNodeVar .
<http://rdfh.ch/lists/0107/stardom-list-thematicKeywords-character> kb:hasSubListNode* ?listNodeVar .
?isPartOfPiece rdfs:subPropertyOf* stardom:isPartOfDocumentPiece .
?page ?isPartOfPiece ?piece .
#?isPartOfPieceValue rdfs:subPropertyOf* stardom:isPartOfDocumentPieceValue .
?page stardom:isPartOfDocumentPieceValue ?isPartOfPieceValueLinkValue .
?isPartOfPieceValueLinkValue rdf:type kb:LinkValue .
?isPartOfPieceValueLinkValue rdf:object ?piece .
#?hasKeyword rdfs:subPropertyOf* stardom:hasThematicKeyword .
?page stardom:hasThematicKeyword ?keyword .
FILTER NOT EXISTS { ?piece kb:isDeleted "true"^^xmls:boolean . }
FILTER NOT EXISTS { ?movie kb:isDeleted "true"^^xmls:boolean . }
FILTER NOT EXISTS { ?page kb:isDeleted "true"^^xmls:boolean . }
FILTER NOT EXISTS { ?isPartOfPieceValueLinkValue kb:isDeleted "true"^^xmls:boolean . }
FILTER NOT EXISTS { ?keyword kb:isDeleted "true"^^xmls:boolean . }
}
GROUP BY ?page
ORDER BY ASC(?page)
LIMIT 25
cuts the previous run from 497s to 382s
and fixing the list node reduces it to 90s:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX kb: <http://www.knora.org/ontology/knora-base#>
PREFIX stardom: <http://www.knora.org/ontology/0107/stardom#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xmls: <http://www.w3.org/2001/XMLSchema#>
SELECT DISTINCT
?page
(GROUP_CONCAT(DISTINCT(IF(BOUND(?piece), STR(?piece), "")); SEPARATOR='') AS ?piece__Concat)
(GROUP_CONCAT(DISTINCT(IF(BOUND(?keyword), STR(?keyword), "")); SEPARATOR='') AS ?keyword__Concat)
(GROUP_CONCAT(DISTINCT(IF(BOUND(?isPartOfPieceValueLinkValue), STR(?isPartOfPieceValueLinkValue), "")); SEPARATOR='') AS ?isPartOfPieceValueLinkValue__Concat)
(GROUP_CONCAT(DISTINCT(IF(BOUND(?movie), STR(?movie), "")); SEPARATOR='') AS ?movie__Concat)
# from <http://www.knora.org/data/0107/stardom>
WHERE {
#?relatedToMovie rdfs:subPropertyOf* stardom:isRelatedToMovie .
?piece stardom:isRelatedToMovie ?movie .
#?hasListNode rdfs:subPropertyOf* kb:valueHasListNode .
#?keyword kb:valueHasListNode ?listNodeVar .
#<http://rdfh.ch/lists/0107/stardom-list-thematicKeywords-character> kb:hasSubListNode* ?listNodeVar .
?keyword kb:valueHasListNode <http://rdfh.ch/lists/0107/stardom-list-thematicKeywords-character> .
#?isPartOfPiece rdfs:subPropertyOf* stardom:isPartOfDocumentPiece .
?page stardom:isPartOfDocumentPiece ?piece .
#?isPartOfPieceValue rdfs:subPropertyOf* stardom:isPartOfDocumentPieceValue .
?page stardom:isPartOfDocumentPieceValue ?isPartOfPieceValueLinkValue .
?isPartOfPieceValueLinkValue rdf:type kb:LinkValue .
?isPartOfPieceValueLinkValue rdf:object ?piece .
#?hasKeyword rdfs:subPropertyOf* stardom:hasThematicKeyword .
?page stardom:hasThematicKeyword ?keyword .
FILTER NOT EXISTS { ?piece kb:isDeleted "true"^^xmls:boolean . }
FILTER NOT EXISTS { ?movie kb:isDeleted "true"^^xmls:boolean . }
FILTER NOT EXISTS { ?page kb:isDeleted "true"^^xmls:boolean . }
FILTER NOT EXISTS { ?isPartOfPieceValueLinkValue kb:isDeleted "true"^^xmls:boolean . }
FILTER NOT EXISTS { ?keyword kb:isDeleted "true"^^xmls:boolean . }
}
GROUP BY ?page
ORDER BY ASC(?page)
LIMIT 25
same request on our prod (graphdb) takes 100ms.
simplifying the keyword path only makes fuseki return in 182s (compared to the 2'602s)
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX kb: <http://www.knora.org/ontology/knora-base#>
PREFIX stardom: <http://www.knora.org/ontology/0107/stardom#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xmls: <http://www.w3.org/2001/XMLSchema#>
SELECT DISTINCT
?page
(GROUP_CONCAT(DISTINCT(IF(BOUND(?piece), STR(?piece), "")); SEPARATOR='') AS ?piece__Concat)
(GROUP_CONCAT(DISTINCT(IF(BOUND(?keyword), STR(?keyword), "")); SEPARATOR='') AS ?keyword__Concat)
(GROUP_CONCAT(DISTINCT(IF(BOUND(?isPartOfPieceValueLinkValue), STR(?isPartOfPieceValueLinkValue), "")); SEPARATOR='') AS ?isPartOfPieceValueLinkValue__Concat)
(GROUP_CONCAT(DISTINCT(IF(BOUND(?movie), STR(?movie), "")); SEPARATOR='') AS ?movie__Concat)
WHERE {
?relatedToMovie rdfs:subPropertyOf* stardom:isRelatedToMovie .
?piece ?relatedToMovie ?movie .
?keyword kb:valueHasListNode <http://rdfh.ch/lists/0107/stardom-list-thematicKeywords-character> .
?isPartOfPiece rdfs:subPropertyOf* stardom:isPartOfDocumentPiece .
?page ?isPartOfPiece ?piece .
?isPartOfPieceValue rdfs:subPropertyOf* stardom:isPartOfDocumentPieceValue .
?page ?isPartOfPieceValue ?isPartOfPieceValueLinkValue .
?isPartOfPieceValueLinkValue rdf:type kb:LinkValue .
?isPartOfPieceValueLinkValue rdf:object ?piece .
?hasKeyword rdfs:subPropertyOf* stardom:hasThematicKeyword .
?page ?hasKeyword ?keyword .
FILTER NOT EXISTS { ?piece kb:isDeleted "true"^^xmls:boolean . }
FILTER NOT EXISTS { ?movie kb:isDeleted "true"^^xmls:boolean . }
FILTER NOT EXISTS { ?page kb:isDeleted "true"^^xmls:boolean . }
FILTER NOT EXISTS { ?isPartOfPieceValueLinkValue kb:isDeleted "true"^^xmls:boolean . }
FILTER NOT EXISTS { ?keyword kb:isDeleted "true"^^xmls:boolean . }
}
GROUP BY ?page
ORDER BY ASC(?page)
LIMIT 25
starting with the usual request that times-out at 20s:
https://dasch.atlassian.net/servicedesk/customer/portal/1/DSQ-56