testing #1

Open loicjaouen opened 2 years ago

loicjaouen commented 2 years ago

starting with the usual request that times-out at 20s:


loicjaouen commented 2 years ago

gravsearch request is:

PREFIX kb: <http://api.knora.org/ontology/knora-api/v2#>
PREFIX stardom: <>


    ?page kb:isMainResource true .

    ?page stardom:isPartOfDocumentPiece ?piece .
    ?piece stardom:isRelatedToMovie ?propVallinkedRes000 .

    ?page stardom:hasThematicKeyword ?keyword .


    ?page a kb:Resource .
    ?page a stardom:Page .

    ?page stardom:hasThematicKeyword ?keyword .
    ?keyword kb:listValueAsListNode <http://rdfh.ch/lists/0107/stardom-list-thematicKeywords-character> .

    ?page stardom:isPartOfDocumentPiece ?piece .
    ?piece stardom:isRelatedToMovie ?propVallinkedRes000 .

    ?piece a stardom:DocumentPiece .

loicjaouen commented 2 years ago

our repo is rather big:

graph name triples
http://www.knora.org/data/0105/drawings-gods 3583561
http://www.knora.org/data/0101/parole-religieuse 1397322
http://www.knora.org/data/0107/stardom 1146732
http://www.knora.org/data/0112/roud-oeuvres 791442
http://www.knora.org/data/0103/theatre-societe 676432
http://www.knora.org/data/0114/elites-cio 501129
loicjaouen commented 2 years ago

The request is pushing the cpu usage but the ram is rather limited.
Pushing the default -Xmx 3G to 5G doesn't improve perfs significantly.
Trying -Xms to 5G doesn't do much either.

loicjaouen commented 2 years ago

There is no time-out on fuseki, so the request run forever (impacting further requests)

loicjaouen commented 2 years ago

the generated sparql for fuseki, reformatted, is:

PREFIX rdfs:    <http://www.w3.org/2000/01/rdf-schema#>
PREFIX kb:      <http://www.knora.org/ontology/knora-base#>
PREFIX stardom: <http://www.knora.org/ontology/0107/stardom#>
PREFIX rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xmls:    <http://www.w3.org/2001/XMLSchema#>

    (GROUP_CONCAT(DISTINCT(IF(BOUND(?piece), STR(?piece), "")); SEPARATOR='') AS ?piece__Concat) 
    (GROUP_CONCAT(DISTINCT(IF(BOUND(?keyword), STR(?keyword), "")); SEPARATOR='') AS ?keyword__Concat) 
    (GROUP_CONCAT(DISTINCT(IF(BOUND(?isPartOfPieceValueLinkValue), STR(?isPartOfPieceValueLinkValue), "")); SEPARATOR='') AS ?isPartOfPieceValueLinkValue__Concat) 
    (GROUP_CONCAT(DISTINCT(IF(BOUND(?movie), STR(?movie), "")); SEPARATOR='') AS ?movie__Concat) 
    ?relatedToMovie rdfs:subPropertyOf* stardom:isRelatedToMovie . 
    ?piece ?relatedToMovie ?movie . 
    ?hasListNode rdfs:subPropertyOf* kb:valueHasListNode . 
    ?keyword ?hasListNode ?listNodeVar . 
    <http://rdfh.ch/lists/0107/stardom-list-thematicKeywords-character> kb:hasSubListNode* ?listNodeVar . 
    ?isPartOfPiece rdfs:subPropertyOf* stardom:isPartOfDocumentPiece . 
    ?page ?isPartOfPiece ?piece . 
    ?isPartOfPieceValue rdfs:subPropertyOf* stardom:isPartOfDocumentPieceValue . 
    ?page ?isPartOfPieceValue ?isPartOfPieceValueLinkValue . 
    ?isPartOfPieceValueLinkValue rdf:type kb:LinkValue . 
    ?isPartOfPieceValueLinkValue rdf:object ?piece .
    ?hasKeyword rdfs:subPropertyOf* stardom:hasThematicKeyword . 
    ?page ?hasKeyword ?keyword .

    FILTER NOT EXISTS {  ?piece kb:isDeleted "true"^^xmls:boolean .  } 
    FILTER NOT EXISTS {  ?movie kb:isDeleted "true"^^xmls:boolean .  } 
    FILTER NOT EXISTS {  ?page kb:isDeleted "true"^^xmls:boolean .  } 
    FILTER NOT EXISTS {  ?isPartOfPieceValueLinkValue kb:isDeleted "true"^^xmls:boolean .  } 
    FILTER NOT EXISTS {  ?keyword kb:isDeleted "true"^^xmls:boolean .  } 
GROUP BY ?page 
ORDER BY ASC(?page) 
loicjaouen commented 2 years ago


worked out the config (to be used with make init-db-test-empty)

@prefix :           <http://base/#> .
@prefix fuseki:     <http://jena.apache.org/fuseki#> .
@prefix ja:         <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix tdb2:       <http://jena.apache.org/2016/tdb#> .
@prefix rdf:        <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:       <http://www.w3.org/2000/01/rdf-schema#> .
@prefix text:       <http://jena.apache.org/text#> .

@prefix knora-base: <http://www.knora.org/ontology/knora-base#> .

tdb2:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .

ja:DatasetTxnMem  rdfs:subClassOf  ja:RDFDataset .

<http://jena.hpl.hp.com/2008/tdb#DatasetTDB> rdfs:subClassOf  ja:RDFDataset .

<http://jena.hpl.hp.com/2008/tdb#GraphTDB> rdfs:subClassOf  ja:Model .

tdb2:GraphTDB2  rdfs:subClassOf  ja:Model .
ja:MemoryDataset  rdfs:subClassOf  ja:RDFDataset .
ja:RDFDatasetZero  rdfs:subClassOf  ja:RDFDataset .
<http://jena.apache.org/text#TextDataset> rdfs:subClassOf  ja:RDFDataset .
tdb2:GraphTDB  rdfs:subClassOf  ja:Model .
ja:RDFDatasetOne  rdfs:subClassOf  ja:RDFDataset .
ja:RDFDatasetSink  rdfs:subClassOf  ja:RDFDataset .
tdb2:DatasetTDB2  rdfs:subClassOf  ja:RDFDataset .

[] rdf:type        fuseki:Server ;
   fuseki:services :service_tdb_all ;
   ja:loadClass    "org.apache.jena.query.text.TextQuery" .

:service_tdb_all  a                   fuseki:Service ;
        rdfs:label                    "TDB2 @REPOSITORY@" ;
        fuseki:dataset                :text_dataset ;
        fuseki:name                   "@REPOSITORY@" ;
        fuseki:serviceQuery           "query" , "" , "sparql" ;
        fuseki:serviceReadGraphStore  "get" ;
        fuseki:serviceReadQuads       "" ;
        fuseki:serviceReadWriteGraphStore  "data" ;
        fuseki:serviceReadWriteQuads  "" ;
        fuseki:serviceUpdate          "" , "update" ;
        fuseki:serviceUpload          "upload" .

        a              tdb2:DatasetTDB2 ;
        # adding this back
        tdb2:unionDefaultGraph              true ;
        tdb2:location  "/fuseki/databases/@REPOSITORY@" .

:dataset a ja:RDFDataset ;
    ja:defaultGraph :model_inf .

:model_inf a ja:InfModel ;
     ja:baseModel :graph ;
     ja:reasoner [
         #ja:reasonerURL <http://jena.hpl.hp.com/2003/OWLFBRuleReasoner>
         ja:reasonerURL <http://jena.hpl.hp.com/2003/TransitiveReasoner>
     ] .

:graph rdf:type tdb2:GraphTDB ;
  tdb2:dataset :tdb_dataset_readwrite .

:text_dataset rdf:type     text:TextDataset ;
    text:dataset   :dataset ;
    text:index     :indexLucene .

:indexLucene a text:TextIndexLucene ;
    text:directory <file:/fuseki/lucene/@REPOSITORY@> ;
    text:entityMap :entMap ;
    # below are added lines, to be removed? 
    text:storeValues true ;
    text:analyzer [ a text:StandardAnalyzer ] ;
    text:queryAnalyzer [ a text:StandardAnalyzer ] ;
    text:queryParser text:AnalyzingQueryParser .

# Mapping in the index
# URI stored in field "uri"
# knora-base:valueHasString is mapped to field "text"
:entMap a text:EntityMap ;
    text:entityField      "uri" ;
    text:defaultField     "text" ;
    text:uidField         "uid" ;
    #text:langField        "lang" ;
    #text:graphField       "graph" ;
    text:map (
        [ text:field  "text" ;  text:predicate  rdfs:label ]
        [ text:field  "text" ;  text:predicate  knora-base:valueHasString ]
        [ text:field  "text" ;  text:predicate  knora-base:valueHasComment ]
    ) .

unfortunately, the basic request do not work (GET
because despite the:

        tdb2:unionDefaultGraph              true ;

one has to add the graph, but the default requests don't do so.

request generates the sparql, which work only with the added FROM clause:

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX knora-admin: <http://www.knora.org/ontology/knora-admin#>  
CONSTRUCT { ?s ?p ?o . }   
FROM <http://www.knora.org/data/admin>
WHERE {                        
  ?s rdf:type knora-admin:knoraProject .     
  ?s ?p ?o . 
loicjaouen commented 2 years ago

A bit of effort in this direction could solve this.

loicjaouen commented 2 years ago

and the previous never ending request, with the addition of a:

from <http://www.knora.org/data/0107/stardom>

returns... but in 800 seconds (15 minutes)

loicjaouen commented 2 years ago

without the inference (default config) it finally returns after 2'602 seconds (45 minutes) so it improves significantly the speed, but maybe still not enough (and there are still some details to sort out)

going on without the inference on

to test if the repo size matters

loicjaouen commented 2 years ago

Does the size of the graph matter?

cutting the data, from (in number of triples):

default graph 8'512'051
http://www.knora.org/data/0105/drawings-gods 3'583'561
http://www.knora.org/data/0101/parole-religieuse 1'397'322


default graph 3'530'560
http://www.knora.org/data/0107/stardom 1'146'732
http://www.knora.org/data/0112/roud-oeuvres 791'442

=> yes, the request completes in 784 seconds (13 minutes)

loicjaouen commented 2 years ago
cutting down to: default graph 1561557
http://www.knora.org/data/0107/stardom 1146732
http://www.knora.org/data/0116/medframes 367742

the same request takes: 497 s (8 minutes)

for the record, the same request on our prod does not finish because we have a timeout at 30s

=> so the requests are completely different between graphdb et fuseki

loicjaouen commented 2 years ago

we know that property paths are quite costly, simplifying the request:

PREFIX rdfs:    <http://www.w3.org/2000/01/rdf-schema#>
PREFIX kb:      <http://www.knora.org/ontology/knora-base#>
PREFIX stardom: <http://www.knora.org/ontology/0107/stardom#>
PREFIX rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xmls:    <http://www.w3.org/2001/XMLSchema#>

    (GROUP_CONCAT(DISTINCT(IF(BOUND(?piece), STR(?piece), "")); SEPARATOR='') AS ?piece__Concat) 
    (GROUP_CONCAT(DISTINCT(IF(BOUND(?keyword), STR(?keyword), "")); SEPARATOR='') AS ?keyword__Concat) 
    (GROUP_CONCAT(DISTINCT(IF(BOUND(?isPartOfPieceValueLinkValue), STR(?isPartOfPieceValueLinkValue), "")); SEPARATOR='') AS ?isPartOfPieceValueLinkValue__Concat) 
    (GROUP_CONCAT(DISTINCT(IF(BOUND(?movie), STR(?movie), "")); SEPARATOR='') AS ?movie__Concat) 
# from <http://www.knora.org/data/0107/stardom>
    #?relatedToMovie rdfs:subPropertyOf* stardom:isRelatedToMovie . 
    ?piece stardom:isRelatedToMovie ?movie . 
    #?hasListNode rdfs:subPropertyOf* kb:valueHasListNode . 
    ?keyword kb:valueHasListNode ?listNodeVar . 
    <http://rdfh.ch/lists/0107/stardom-list-thematicKeywords-character> kb:hasSubListNode* ?listNodeVar . 
    ?isPartOfPiece rdfs:subPropertyOf* stardom:isPartOfDocumentPiece . 
    ?page ?isPartOfPiece ?piece . 
    #?isPartOfPieceValue rdfs:subPropertyOf* stardom:isPartOfDocumentPieceValue . 
    ?page stardom:isPartOfDocumentPieceValue ?isPartOfPieceValueLinkValue . 
    ?isPartOfPieceValueLinkValue rdf:type kb:LinkValue . 
    ?isPartOfPieceValueLinkValue rdf:object ?piece .
    #?hasKeyword rdfs:subPropertyOf* stardom:hasThematicKeyword . 
    ?page stardom:hasThematicKeyword ?keyword .

    FILTER NOT EXISTS {  ?piece kb:isDeleted "true"^^xmls:boolean .  } 
    FILTER NOT EXISTS {  ?movie kb:isDeleted "true"^^xmls:boolean .  } 
    FILTER NOT EXISTS {  ?page kb:isDeleted "true"^^xmls:boolean .  } 
    FILTER NOT EXISTS {  ?isPartOfPieceValueLinkValue kb:isDeleted "true"^^xmls:boolean .  } 
    FILTER NOT EXISTS {  ?keyword kb:isDeleted "true"^^xmls:boolean .  } 
GROUP BY ?page 
ORDER BY ASC(?page) 

cuts the previous run from 497s to 382s

and fixing the list node reduces it to 90s:

PREFIX rdfs:    <http://www.w3.org/2000/01/rdf-schema#>
PREFIX kb:      <http://www.knora.org/ontology/knora-base#>
PREFIX stardom: <http://www.knora.org/ontology/0107/stardom#>
PREFIX rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xmls:    <http://www.w3.org/2001/XMLSchema#>

    (GROUP_CONCAT(DISTINCT(IF(BOUND(?piece), STR(?piece), "")); SEPARATOR='') AS ?piece__Concat) 
    (GROUP_CONCAT(DISTINCT(IF(BOUND(?keyword), STR(?keyword), "")); SEPARATOR='') AS ?keyword__Concat) 
    (GROUP_CONCAT(DISTINCT(IF(BOUND(?isPartOfPieceValueLinkValue), STR(?isPartOfPieceValueLinkValue), "")); SEPARATOR='') AS ?isPartOfPieceValueLinkValue__Concat) 
    (GROUP_CONCAT(DISTINCT(IF(BOUND(?movie), STR(?movie), "")); SEPARATOR='') AS ?movie__Concat) 
# from <http://www.knora.org/data/0107/stardom>
    #?relatedToMovie rdfs:subPropertyOf* stardom:isRelatedToMovie . 
    ?piece stardom:isRelatedToMovie ?movie . 
    #?hasListNode rdfs:subPropertyOf* kb:valueHasListNode . 
    #?keyword kb:valueHasListNode ?listNodeVar . 
    #<http://rdfh.ch/lists/0107/stardom-list-thematicKeywords-character> kb:hasSubListNode* ?listNodeVar . 
    ?keyword kb:valueHasListNode <http://rdfh.ch/lists/0107/stardom-list-thematicKeywords-character> .
    #?isPartOfPiece rdfs:subPropertyOf* stardom:isPartOfDocumentPiece . 
    ?page stardom:isPartOfDocumentPiece ?piece . 
    #?isPartOfPieceValue rdfs:subPropertyOf* stardom:isPartOfDocumentPieceValue . 
    ?page stardom:isPartOfDocumentPieceValue ?isPartOfPieceValueLinkValue . 
    ?isPartOfPieceValueLinkValue rdf:type kb:LinkValue . 
    ?isPartOfPieceValueLinkValue rdf:object ?piece .
    #?hasKeyword rdfs:subPropertyOf* stardom:hasThematicKeyword . 
    ?page stardom:hasThematicKeyword ?keyword .

    FILTER NOT EXISTS {  ?piece kb:isDeleted "true"^^xmls:boolean .  } 
    FILTER NOT EXISTS {  ?movie kb:isDeleted "true"^^xmls:boolean .  } 
    FILTER NOT EXISTS {  ?page kb:isDeleted "true"^^xmls:boolean .  } 
    FILTER NOT EXISTS {  ?isPartOfPieceValueLinkValue kb:isDeleted "true"^^xmls:boolean .  } 
    FILTER NOT EXISTS {  ?keyword kb:isDeleted "true"^^xmls:boolean .  } 
GROUP BY ?page 
ORDER BY ASC(?page) 

same request on our prod (graphdb) takes 100ms.

loicjaouen commented 2 years ago

simplifying the keyword path only makes fuseki return in 182s (compared to the 2'602s)

PREFIX rdfs:    <http://www.w3.org/2000/01/rdf-schema#>
PREFIX kb:      <http://www.knora.org/ontology/knora-base#>
PREFIX stardom: <http://www.knora.org/ontology/0107/stardom#>
PREFIX rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xmls:    <http://www.w3.org/2001/XMLSchema#>

    (GROUP_CONCAT(DISTINCT(IF(BOUND(?piece), STR(?piece), "")); SEPARATOR='') AS ?piece__Concat) 
    (GROUP_CONCAT(DISTINCT(IF(BOUND(?keyword), STR(?keyword), "")); SEPARATOR='') AS ?keyword__Concat) 
    (GROUP_CONCAT(DISTINCT(IF(BOUND(?isPartOfPieceValueLinkValue), STR(?isPartOfPieceValueLinkValue), "")); SEPARATOR='') AS ?isPartOfPieceValueLinkValue__Concat) 
    (GROUP_CONCAT(DISTINCT(IF(BOUND(?movie), STR(?movie), "")); SEPARATOR='') AS ?movie__Concat) 
    ?relatedToMovie rdfs:subPropertyOf* stardom:isRelatedToMovie . 
    ?piece ?relatedToMovie ?movie . 
    ?keyword kb:valueHasListNode <http://rdfh.ch/lists/0107/stardom-list-thematicKeywords-character> . 
    ?isPartOfPiece rdfs:subPropertyOf* stardom:isPartOfDocumentPiece . 
    ?page ?isPartOfPiece ?piece . 
    ?isPartOfPieceValue rdfs:subPropertyOf* stardom:isPartOfDocumentPieceValue . 
    ?page ?isPartOfPieceValue ?isPartOfPieceValueLinkValue . 
    ?isPartOfPieceValueLinkValue rdf:type kb:LinkValue . 
    ?isPartOfPieceValueLinkValue rdf:object ?piece .
    ?hasKeyword rdfs:subPropertyOf* stardom:hasThematicKeyword . 
    ?page ?hasKeyword ?keyword .

    FILTER NOT EXISTS {  ?piece kb:isDeleted "true"^^xmls:boolean .  } 
    FILTER NOT EXISTS {  ?movie kb:isDeleted "true"^^xmls:boolean .  } 
    FILTER NOT EXISTS {  ?page kb:isDeleted "true"^^xmls:boolean .  } 
    FILTER NOT EXISTS {  ?isPartOfPieceValueLinkValue kb:isDeleted "true"^^xmls:boolean .  } 
    FILTER NOT EXISTS {  ?keyword kb:isDeleted "true"^^xmls:boolean .  } 
GROUP BY ?page 
ORDER BY ASC(?page)