Open serasset opened 2 years ago
First thing, I see that your endpoint is running version 07.20.3230 (a/k/a 7.2.6.1), built Sep 27, 2020 from git_head 840b468fc, committed on Sep 1, 2020, probably from the stable/7
branch.
I would suggest you test against a current build from the develop/7
branch (which we will be shipping as/to stable/7
in the next few weeks). If we can quickly confirm this behavior in the latest develop/7
branch, we may be able to get a patch in before that stable/7
push.
I recompiled virtuoso on my debian machine with :
./configure --with-layout=debian --with-readline --enable-dbpedia-vad --enable-fct-vad --enable-rdfmappers-vad --with-port=2222
make
sudo make install
Then an access to http://kaiko.getalp.org/sparql led to Virtuoso 42001 Error SR185: Undefined procedure WS.WS.SPARQL_ENDPOINT_GENERATE_FORM
I regenarated the whole db from scratch using the newly compiled virtuoso, same problem. Any idea ?
When switching from stable/7
to develop/7
some of the build environment needs to be regenerated.
Please run the following command:
$ bash autogen.sh
Followed by the following commands to rebuild and install VOS:
$ make
$ sudo make install
Yes, I did that, but the distclean seams to keep some files. So I fetched a clean git clone and the compilation went well. I'll try the queries now.
With Virtuoso version 07.20.3233 on Linux (x86_64-pc-linux-gnu) Single Server Edition (142 GB total memory, 17 GB memory in use) :
Theoretical results (obtained through isql with no timeout) : Q1:
Through SPARQL endpoint without timeout Q1:
time curl -I 'http://kaiko.getalp.org/sparql?default-graph-uri=&query=SELECT+%3Flemma%2C+COUNT%28%3Fwritten_trans%29+AS+%3Fnb_translations%0D%0AWHERE+%7B%0D%0A++++%3Flexentry+ontolex%3AcanonicalForm+%5B%0D%0A++++++++ontolex%3AwrittenRep+%3Flemma+%0D%0A++++%5D+.%0D%0A++++FILTER+%28lang%28%3Flemma%29+%3D+%27en%27%29%0D%0A++++%3Ftrans+dbnary%3AisTranslationOf+%3Flexentry+%3B%0D%0A++++++++dbnary%3AwrittenForm+%3Fwritten_trans+.%0D%0A%7D+LIMIT+1000%0D%0A&should-sponge=&format=text%2Fhtml&timeout=0&debug=on&run=+Run+Query+'
HTTP/1.1 200 OK
Date: Thu, 28 Apr 2022 06:11:47 GMT
Server: Virtuoso/07.20.3233 (Linux) x86_64-pc-linux-gnu
Accept-Ranges: bytes
Content-disposition: filename=sparql_2022-04-28_06-11-47Z.html
Content-Type: text/html; charset=UTF-8
Content-Length: 82941
curl -I 0.00s user 0.02s system 0% cpu 27.210 total
This is analoguous to the previous result and expected.
Q2:
time curl -I 'http://kaiko.getalp.org/sparql?default-graph-uri=&query=select+distinct+%3FA+%3Fp+%3FB+where+%7B%3Fa+a%2Frdfs%3AsubClassOf*+%3FA.+%3Fa+%3Fp+%3Fb.+OPTIONAL+%7B+%3Fb+rdfs%3AsubClassOf+%3FB+%7D%7D&should-sponge=&format=text%2Fhtml&timeout=&debug=on&run=+Run+Query+'
HTTP/1.1 200 OK
Date: Thu, 28 Apr 2022 06:13:14 GMT
Server: Virtuoso/07.20.3233 (Linux) x86_64-pc-linux-gnu
Accept-Ranges: bytes
Content-disposition: filename=sparql_2022-04-28_06-13-14Z.html
Content-Type: text/html; charset=UTF-8
Content-Length: 314378
curl -I 0.00s user 0.01s system 0% cpu 1:00.34 total
Here the default server timeout is hit, but now the request does not fail anymore but returns partial results (~850 rows) without any adequate response headers. This is unexpected in this version while it was failing (expectedly) in previous version.
Through SPARQL endpoint with timeout Q1:
time curl -I 'http://kaiko.getalp.org/sparql?default-graph-uri=&query=SELECT+%3Flemma%2C+COUNT%28%3Fwritten_trans%29+AS+%3Fnb_translations%0D%0AWHERE+%7B%0D%0A++++%3Flexentry+ontolex%3AcanonicalForm+%5B%0D%0A++++++++ontolex%3AwrittenRep+%3Flemma+%0D%0A++++%5D+.%0D%0A++++FILTER+%28lang%28%3Flemma%29+%3D+%27en%27%29%0D%0A++++%3Ftrans+dbnary%3AisTranslationOf+%3Flexentry+%3B%0D%0A++++++++dbnary%3AwrittenForm+%3Fwritten_trans+.%0D%0A%7D+LIMIT+1000%0D%0A&should-sponge=&format=text%2Fhtml&timeout=1001&debug=on&run=+Run+Query+'
HTTP/1.1 200 OK
Date: Thu, 28 Apr 2022 06:22:59 GMT
Server: Virtuoso/07.20.3233 (Linux) x86_64-pc-linux-gnu
Accept-Ranges: bytes
X-SQL-State: S1TAT
X-SQL-Message: RC...: Returning incomplete results, query interrupted by result timeout. Activity: 4.156M rnd 1.426M seq 1.616M same seg 1.183M same pg 110.8K same par 0 disk 0 spec disk 0B / 0
X-Exec-Milliseconds: 2981
X-Exec-DB-Activity: 4.156M rnd 1.426M seq 1.616M same seg 1.183M same pg 110.8K same par 0 disk 0 spec disk 0B / 0 messages 0 fork
Content-disposition: filename=sparql_2022-04-28_06-22-59Z.html
Content-Type: text/html; charset=UTF-8
Content-Length: 82566
curl -I 0.00s user 0.01s system 0% cpu 3.080 total
This is analoguous to the previous result and expected. Partial results returned and response headers given back stating so.
Q2:
time curl -I 'http://kaiko.getalp.org/sparql?default-graph-uri=&query=select+distinct+%3FA+%3Fp+%3FB+where+%7B%3Fa+a%2Frdfs%3AsubClassOf*+%3FA.+%3Fa+%3Fp+%3Fb.+OPTIONAL+%7B+%3Fb+rdfs%3AsubClassOf+%3FB+%7D%7D&should-sponge=&format=text%2Fhtml&timeout=1001&debug=on&run=+Run+Query+'
HTTP/1.1 200 OK
Date: Thu, 28 Apr 2022 06:23:25 GMT
Server: Virtuoso/07.20.3233 (Linux) x86_64-pc-linux-gnu
Accept-Ranges: bytes
Content-disposition: filename=sparql_2022-04-28_06-23-25Z.html
Content-Type: text/html; charset=UTF-8
Content-Length: 72651
curl -I 0.00s user 0.01s system 1% cpu 1.709 total
This is unexpected as the results are indeed partial after hitting the timeout (~217 rows) but no headers is given to state so.
We will attempt to recreate this issue in house and see what is going on.
If you need to recreate the exact dbnary
server settings, all dataset is available at http://kaiko.getalp.org/static/ontolex/latest/ and the database is created using script available at https://gitlab.com/gilles.serasset/dbnary/-/blob/master/kaiko/virtuoso/rotate-virtuoso-db.sh
(Sorry, I do not have a docker setting available (yet).)
We are trying to query a dataset while taking care of anytime feature (we indeed want to get partial results rather than errors in case we hit some limits).
Queries are executed on the same virtuoso instance at http://kaiko.getalp.org/sparql
Q1:
Q2:
Without timeout value specified
curl -I 0.00s user 0.01s system 0% cpu 22.525 total
time curl -I 'http://kaiko.getalp.org/sparql?default-graph-uri=&query=select+distinct+%3FA+%3Fp+%3FB+where+%7B%3Fa+a%2Frdfs%3AsubClassOf*+%3FA.+%3Fa+%3Fp+%3Fb.+OPTIONAL+%7B+%3Fb+rdfs%3AsubClassOf+%3FB+%7D%7D&should-sponge=&format=text%2Fhtml&timeout=&debug=on&run=+Run+Query+' HTTP/1.1 500 SPARQL Request Failed Date: Wed, 27 Apr 2022 11:44:09 GMT Server: Virtuoso/07.20.3230 (Linux) x86_64-pc-linux-gnu Accept-Ranges: bytes Content-Type: text/plain; charset=UTF-8 Content-Length: 168 Connection: close
curl -I 0.00s user 0.01s system 0% cpu 1:01.65 total
time curl -I 'http://kaiko.getalp.org/sparql?default-graph-uri=&query=SELECT+%3Flemma%2C+COUNT%28%3Fwritten_trans%29+AS+%3Fnb_translations%0D%0AWHERE+%7B%0D%0A++++%3Flexentry+ontolex%3AcanonicalForm+%5B%0D%0A++++++++ontolex%3AwrittenRep+%3Flemma+%0D%0A++++%5D+.%0D%0A++++FILTER+%28lang%28%3Flemma%29+%3D+%27en%27%29%0D%0A++++%3Ftrans+dbnary%3AisTranslationOf+%3Flexentry+%3B%0D%0A++++++++dbnary%3AwrittenForm+%3Fwritten_trans+.%0D%0A%7D+LIMIT+1000%0D%0A&should-sponge=&format=text%2Fhtml&timeout=1001&debug=on&run=+Run+Query+' HTTP/1.1 200 OK Date: Wed, 27 Apr 2022 10:23:53 GMT Server: Virtuoso/07.20.3230 (Linux) x86_64-pc-linux-gnu Accept-Ranges: bytes X-SQL-State: S1TAT X-SQL-Message: RC...: Returning incomplete results, query interrupted by result timeout. Activity: 4.649M rnd 2.042M seq 2.227M same seg 1.086M same pg 119.5K same par 0 disk 0 spec disk 0B / 0 X-Exec-Milliseconds: 3142 X-Exec-DB-Activity: 4.649M rnd 2.042M seq 2.227M same seg 1.086M same pg 119.5K same par 0 disk 0 spec disk 0B / 0 messages 0 fork Content-Type: text/html; charset=UTF-8 Content-Length: 81684
time curl -I 'http://kaiko.getalp.org/sparql?default-graph-uri=&query=select+distinct+%3FA+%3Fp+%3FB+where+%7B%3Fa+a%2Frdfs%3AsubClassOf*+%3FA.+%3Fa+%3Fp+%3Fb.+OPTIONAL+%7B+%3Fb+rdfs%3AsubClassOf+%3FB+%7D%7D&should-sponge=&format=text%2Fhtml&timeout=1001&debug=on&run=+Run+Query+' HTTP/1.1 200 OK Date: Wed, 27 Apr 2022 10:37:19 GMT Server: Virtuoso/07.20.3230 (Linux) x86_64-pc-linux-gnu Accept-Ranges: bytes Content-Type: text/html; charset=UTF-8 Content-Length: 192742
curl -I 0.00s user 0.01s system 0% cpu 3.217 total