Open ffritsche opened 1 year ago
Has the data ie number of triples in the database increased since the upgrade to the Virtuoso 7.2.9 version in April ?
Has a database integrity check been run on the database to check for any possible database corruption with the command:
backup '/dev/null';
Can the RDF_QUAD table indexes be checked with the following queries:
select count (*) from rdf_quad a table option (index rdf_quad) where not exists (select 1 from rdf_quad b table option (loop, index rdf_quad_pogs) where a.g = b.g and a.p = b.p and a.o = b.o and a.s = b.s);
select count (*) from rdf_quad a table option (index rdf_quad_pogs) where not exists (select 1 from rdf_quad b table option (loop, index primary key) where a.g = b.g and a.p = b.p and a.o = b.o and a.s = b.s);
select count (*) from rdf_quad a table option (index rdf_quad_pogs) where not exists (select 1 from rdf_quad b table option (loop, index rdf_quad_op) where a.g = b.g and a.p = b.p and a.o = b.o and a.s = b.s);
select count (*) from rdf_quad a table option (index rdf_quad_pogs) where not exists (select 1 from rdf_quad b table option (loop, index rdf_quad_sp) where a.g = b.g and a.p = b.p and a.o = b.o and a.s = b.s);
select count (*) from rdf_quad a table option (index rdf_quad_pogs) where not exists (select 1 from rdf_quad b table option (loop, index rdf_quad_gs) where a.g = b.g and a.p = b.p and a.o = b.o and a.s = b.s);
all of which should return a count of "0" if there are no issues.
With the database in this state, please provide the output of running the "status();" command to return statistics on the state of the database.
When you say I am also surprised that before the update to version 7.2.9 last Monday the query took forever (100min) even with DEFINE select-option "order".
, should the version be 7.2.10, rather then 7.2.9 ?
Thanks for your reply, It took some time to execute all this queries.
Has the data, i.e., number of triples in the database, increased since the upgrade to the Virtuoso 7.2.9 version in April?
I created this database with 7.2.9 from the ground new. We update the DB each week. This means adding new publications and updating (remove+add new) other publications.
Has a database integrity check been run on the database to check for any possible database corruption with the command
It returned:
09:38:20 Backup to /dev/null started
09:38:20 Dumping the schema tables
09:38:20 Dumping the registry
09:38:20 Dumping the schema done
12:14:34 Backup to /dev/null complete, processed 32543012 nodes on 6 levels
When you say I am also surprised that before the update to version 7.2.9 last Monday the query took forever (100min) even with DEFINE select-option "order". , should the version be 7.2.10, rather then 7.2.9 ?
Yeah it was to 7.2.10.
With the database in this state, please provide the output of running the "
status();
" command to return statistics on the state of the database.
Version 07.20.3237-pthreads for Linux as of Jun 7 2023 (f3d88f16b)
Started on: 2023-07-19 16:32 GMT+2
Database Status:
File size 446693376, 90232064 pages, 25536878 free.
51000000 buffers, 47048796 used, 16791365 dirty 0 wired down, repl age 0 0 w. io 0 w/crsr.
Disk Usage: 45575645 reads avg 0 msec, 0% r 0% w last 48 s, 0 writes ,
868003 read ahead, batch = 47. Autocompact 0 in 0 out, 0% saved.
Gate: 2793606 2nd in reads, 0 gate write waits, 0 in while read 0 busy scrap.
Log = /mnt/ssd/wosdb/virtuoso.trx, 292187 bytes
45556590 pages have been changed since last backup (in checkpoint state)
Current backup timestamp: 0x0000-0x00-0x00
Last backup date: unknown
Clients: 151 connects, max 49 concurrent
RPC: 19364719 calls, 6536 pending, 6560 max until now, 0 queued, 42 burst reads (0%), 0 second 3M large, 2835M max
Checkpoint Remap 11800577 pages, 0 mapped back. 0 s atomic time.
DB master 90232064 total 25536878 free 11800577 remap 9470856 mapped back
temp 5120 total 5115 free
Lock Status: 0 deadlocks of which 0 2r1w, 45 waits,
Currently 1 threads running 0 threads waiting 0 threads in vdb.
Pending:
Client 1111:141: Account: dba, 738 bytes in, 684 bytes out, 1 stmts.
PID: 23834, OS: unix, Application: unknown, IP#: 127.0.0.1
Transaction status: PENDING, 0 threads.
Locks:
Client 1111:134: Account: dba, 1489 bytes in, 5543 bytes out, 1 stmts.
PID: 22401, OS: unix, Application: unknown, IP#: 127.0.0.1
Transaction status: PENDING, 0 threads.
Locks:
Client 1111:151: Account: dba, 207 bytes in, 288 bytes out, 1 stmts.
PID: 32307, OS: unix, Application: unknown, IP#: 127.0.0.1
Transaction status: PENDING, 1 threads.
Locks:
Client 1111:144: Account: dba, 161017 bytes in, 39076 bytes out, 0 stmts.
PID: 0, OS: Linux, Application: JDBC, IP#: 127.0.0.1
Transaction status: PENDING, 0 threads.
Locks:
Client 1111:136: Account: dba, 1055 bytes in, 1079 bytes out, 1 stmts.
PID: 22588, OS: unix, Application: unknown, IP#: 127.0.0.1
Transaction status: PENDING, 0 threads.
Locks:
Client 1111:2: Account: dba, 5305311 bytes in, 509761 bytes out, 0 stmts.
PID: 0, OS: Linux, Application: JDBC, IP#: 127.0.0.1
Transaction status: PENDING, 0 threads.
Locks:
Client 1111:135: Account: dba, 1055 bytes in, 915 bytes out, 1 stmts.
PID: 22479, OS: unix, Application: unknown, IP#: 127.0.0.1
Transaction status: PENDING, 0 threads.
Locks:
Client 1111:3: Account: dba, 5614930 bytes in, 519679 bytes out, 0 stmts.
PID: 0, OS: Linux, Application: JDBC, IP#: 127.0.0.1
Transaction status: PENDING, 0 threads.
Locks:
Client 1111:140: Account: dba, 738 bytes in, 684 bytes out, 1 stmts.
PID: 23658, OS: unix, Application: unknown, IP#: 127.0.0.1
Transaction status: PENDING, 0 threads.
Locks:
Client 1111:1: Account: dba, 494226 bytes in, 2705516 bytes out, 335 stmts.
PID: 0, OS: Linux, Application: JDBC, IP#: 127.0.0.1
Transaction status: PENDING, 0 threads.
Locks:
Running Statements:
Time (msec) Text
5536 status()
Hash indexes
82 Rows. -- 5538 msec.
Can the RDF_QUAD table indexes be checked with the following queries
The first query returned 0. All the others returned 1. I modified them and removed the count from the select and got this:
VARCHAR NOT NULL VARCHAR NOT NULL VARCHAR NOT NULL LONG VARCHAR NOT NULL
________________ ________________ ________________ _____________________
#i471 #i479 #i474 2023-04-13 22:59:43.896618
1 Rows. -- 83242269 msec.
out of query 2-4.
Hi
The broken RDF_QUAD
table indexes can be repaired with the following commands:
create table rq_recov (g iri_id_8, s iri_id_8, p iri_id_8, o any, primary key (g,s,p,o));
log_enable (3,1);
insert soft rq_recov (g,s,p,o) select g,s,p,o from RDF_QUAD a table option (index RDF_QUAD)
where not exists (select 1 from RDF_QUAD b table option (loop, index RDF_QUAD_POGS)
where a.G = b.G and a.P = b.P and a.O = b.O and a.S = b.S);
insert soft rq_recov (g,s,p,o) select g,s,p,o from RDF_QUAD a table option (index RDF_QUAD_POGS)
where not exists (select 1 from RDF_QUAD b table option (loop, index primary key)
where a.G = b.G and a.P = b.P and a.O = b.O and a.S = b.S);
insert soft rdf_quad index rdf_quad (g,s,p,o) select g,s,p,o from rq_recov;
insert soft rdf_quad index rdf_quad_pogs (g,s,p,o) select g,s,p,o from rq_recov;
insert soft rdf_quad index rdf_quad_sp (g,s,p,o) select g,s,p,o from rq_recov;
insert soft rdf_quad index rdf_quad_op (g,s,p,o) select g,s,p,o from rq_recov;
insert soft rdf_quad index rdf_quad_gs (g,s,p,o) select g,s,p,o from rq_recov;
log_enable (1,1);
drop table rq_recov;
checkpoint;
Then run the previous commands to check the indexes to ensure they now all return a count of "0".
Do ensure a FULL backup of your database is in place before performing these operations, which given the size of your database will take some time to complete.
How much space do I need on for this? Does this only save this 1 broken QUAD in a new table and replace the old entries with it?
Only the data of the missing one index will be copied to the "rq_recov" table , so if there is only one bad index very little additional space will be required.
I repaired all 5 tables and the result of this queries is now 0.
Sadly the problem with the optimization of this queries still exists. the ?pub fgi:hasTopic ?topic
is always on top.
Do you have another idea?
OK, at least the RDF indexes are now clean.
Can you provide a query(s) profile
plan (rather than an explain
plan) with the SET PROFILE ON;
command run from isql
? This will enable the query compilation and execution times to be seen, so we have an idea as to where the time is being spent.
You indicated the database is being updated daily. What is the typical rate, i.e., how many triples per day, week, or month? I note from the status()
output, of the 51000000
buffers allocated, 47048796
are used, so you are approaching the point where all are being used (depending on the update rate), which would affect performance.
If you look at the system memory in use with the top
shell command, when under typical query workload, how much is used and how much is free?
Note this documentation on Virtuoso Query Optimization Diagnostics, which provides some options on how the number of plans used to find a good one can be controlled to a degree, possibly limiting memory consumption.
Note that when the DEFINE select-option "order"
pragma is set, queries run single-threaded, which is why you are seeing only one core in use when running queries with this pragma set.
Can you provide a query(s)
profile
plan (rather than an explain plan) with theSET PROFILE ON;
command run fromisql
? This will enable the query compilation and execution times to be seen, so we have an idea as to where the time is being spent.
Here is the query
SELECT ?iri ?label (count(distinct ?pub) as ?count) WHERE {
<http://int.database.de/linkeddata/resources/sets/8c5dd2e1-7fa6-42a0-be03-024dfc57d2d1> dct:hasPart ?pub.
?pub fgi:hasTopic ?iri.
?iri a fgi:Subject.
?iri rdfs:label ?label.
}
With DEFINE select-option "order"
:
{
time 0.0024% fanout 1 input 1 rows
Subquery 28
{
time 0.0024% fanout 1 input 1 rows
fork {
time 0.98% fanout 10907 input 1 rows
RDF_QUAD 1.1e+04 rows(s_1_8_t0.O$34)
inlined P = IRI_ID"...hasPart" , S = IRI_ID"...8c5dd2e1-7fa6-42a0-be03-024dfc57d2d1"
time 62% fanout 3.47584 input 10907 rows
RDF_QUAD 2 rows(s_1_8_t1.O$37)
inlined P = IRI_ID"...hasTopic" , S = cast$127
time 3.9% fanout 0.535016 input 37911 rows
RDF_QUAD_POGS 0.8 rows(s_1_8_t2.S$40)
inlined P = IRI_ID"...type" , O = IRI_ID"...Subject" , S = cast$140
time 2.9% fanout 1 input 20283 rows
RDF_QUAD 1 rows(s_1_8_t3.O$43)
inlined P = IRI_ID"...label" , S = k_s_1_8_t1.O$154
time 29% fanout 0 input 20283 rows
Sort (s_1_8_t3.O$43, s_1_8_t1.O$37) -> (inc$53)
}
time 0.97% fanout 173 input 1 rows
group by read node
(s_1_8_t3.O$43, s_1_8_t1.O$37, aggregate$47)
After code:
0: __ro2sq$73 := Call __ro2sq (s_1_8_t3.O$43)
5: __ro2sq$75 := Call __ro2sq (s_1_8_t1.O$37)
10: iri$29 := := artm __ro2sq$75
14: label$30 := := artm __ro2sq$73
18: count$31 := := artm aggregate$47
22: BReturn 0
time 0.0012% fanout 0 input 173 rows
Subquery Select(iri$29, label$30, count$31)
}
After code:
0: iri$84 := Call __ro2sq (iri$29)
5: label$86 := Call __ro2sq (label$30)
10: count$88 := Call __ro2sq (count$31)
15: BReturn 0
time 0.00069% fanout 0 input 173 rows
Select (iri$84, label$86, count$88)
}
108 msec 92% cpu, 69263 rnd 37899 seq 90.6546% same seg 8.52968% same pg
33 disk reads, 171 read ahead, 0.152771% wait
Compilation: 1 msec 0 reads 0% read 0 messages 0% clw
5 Rows. -- 119 msec.
and without:
{
time 4.2e-06% fanout 1 input 1 rows
Subquery 28
{
time 4e-06% fanout 1 input 1 rows
fork {
time 4.7% fanout 2.4332e+08 input 1 rows
RDF_QUAD 2 rows(s_1_8_t1.S$35, s_1_8_t1.O$34)
inlined P = IRI_ID"...hasTopic"
time 95% fanout 0.000155807 input 2.4332e+08 rows
RDF_QUAD_POGS 0.0005 rows(s_1_8_t0.O$38)
inlined P = IRI_ID"...hasPart" , O = cast$128 , S = IRI_ID"...8c5dd2e1-7fa6-42a0-be03-024dfc57d2d1"
time 0.13% fanout 0.535016 input 37911 rows
RDF_QUAD_POGS 0.8 rows(s_1_8_t2.S$41)
inlined P = IRI_ID"...type" , O = IRI_ID"...Subject" , S = k_s_1_8_t1.O$142
time 0.17% fanout 1 input 20283 rows
RDF_QUAD 1 rows(s_1_8_t3.O$44)
inlined P = IRI_ID"...label" , S = k_s_1_8_t1.O$156
time 0.088% fanout 0 input 20283 rows
Sort (s_1_8_t3.O$44, s_1_8_t1.O$34) -> (inc$54)
}
time 0.0017% fanout 173 input 1 rows
group by read node
(s_1_8_t3.O$44, s_1_8_t1.O$34, aggregate$48)
After code:
0: __ro2sq$74 := Call __ro2sq (s_1_8_t3.O$44)
5: __ro2sq$76 := Call __ro2sq (s_1_8_t1.O$34)
10: iri$29 := := artm __ro2sq$76
14: label$30 := := artm __ro2sq$74
18: count$31 := := artm aggregate$48
22: BReturn 0
time 2e-06% fanout 0 input 173 rows
Subquery Select(iri$29, label$30, count$31)
}
After code:
0: iri$85 := Call __ro2sq (iri$29)
5: label$87 := Call __ro2sq (label$30)
10: count$89 := Call __ro2sq (count$31)
15: BReturn 0
time 1.2e-06% fanout 0 input 173 rows
Select (iri$85, label$87, count$89)
}
65898 msec 99% cpu, 2.43378e+08 rnd 2.43132e+08 seq 99.9156% same seg 0.00771514% same pg
Compilation: 2 msec 0 reads 0% read 0 messages 0% clw
4 Rows. -- 66412 msec.
The query without DEFINE select-option "order"
takes ages.
You indicated the database is being updated daily. What is the typical rate, i.e., how many triples per day, week, or month? I note from the status() output, of the 51000000 buffers allocated, 47048796 are used, so you are approaching the point where all are being used (depending on the update rate), which would affect performance.
We update the DB once a week. By rule of thumb, there are 10M to 15M new triples per week. Most of the work goes into updating old triples. We update 100M to 150M triple per week (delete and add new).
I can check the exact number next week.
If you look at the system memory in use with the top shell command, when under typical query workload, how much is used and how much is free?
after 1 week: 450GB/1000GB System usage https://imageupload.io/yQOrD3OsmgeDtSD after 1 month+ (older Database version): 750GB/1000GB System usage https://imageupload.io/FFEEaS3iUKjH2Ul
The memory usage rises after some time. There was a segment fault error if I set NumberOfBuffers
and MaxDirtyBuffers
too high. I don't know if it's still in the stable version as I try to be below this threshold (https://github.com/openlink/virtuoso-opensource/issues/742).
Note this documentation on Virtuoso Query Optimization Diagnostics, which provides some options on how the number of plans used to find a good one can be controlled to a degree, possibly limiting memory consumption.
MaxMemPoolSize
is set to 200000000
in the virtuoso.ini
(200M?). I can try MaxOptimizeLayouts
and enable_joins_only
.
Note that when the
DEFINE select-option "order"
pragma is set, queries run single-threaded, which is why you are seeing only one core in use when running queries with this pragma set.
That is good to know.
Based on the profile output provided, the query is taking about 66 secs to run, and only a few msecs are spent compiling the query, both with and without the DEFINE select-option "order"
pragma set on the Virtuoso 7.2.10 build. So the query compilation and memory consumption are not the issue, but more the plan being used for executing the query, it seems.
What methods are being used to perform the weekly updates; i.e., what programming language(s) are being used, and what interface(s) (SQL/HTTP) are the updates being performed against?
Yes the plan being used causes the 0,1s to 66s difference.
We create the nt-files (gunziped) with a separate transformer and use the isql
bulk import for the weekly updates. rdf_loader_run();
runs then 8 times without parameter. The nt-files are split into ~30mb.
Our tools run in Java with jdbc (virt_rdf4j.jar
/virtjdbc4.jar
from 7.2.9) and rdf4j.
Creation of our repo:
VirtuosoRepository repo = new VirtuosoRepository((ConnectionPoolDataSource)connPool, "sesame:nil", true);
The removal of pubs and all dependencies goes over this connection in Java. Userdata will also be created over this connection. But this is only a small part of all our data (1/1000).
Have you reviewed this Customizing Virtuoso SPARQL Query Optimization using Pragmas and Inline Query Options post and tried any of the query optimisation pragmas listed, to influence the query plan used, and possibly improve query performance?
His helps to push the optimizer in the right direction.
The index S
option helps to speed the queries up. The index O/P
and hash
options are all slow.
Here the profile for global:
DEFINE sql:table-option "index S"
SELECT ?iri ?label (count(distinct ?pub) as ?count) WHERE {
<http://int.database.de/linkeddata/resources/sets/8c5dd2e1-7fa6-42a0-be03-024dfc57d2d1> dct:hasPart ?pub .
?pub fgi:hasTopic ?iri.
?iri a fgi:Subject .
?iri rdfs:label ?label .
}
{
time 0.00093% fanout 1 input 1 rows
Subquery 28
{
time 0.00087% fanout 1 input 1 rows
fork {
time 0.014% fanout 1 input 1 rows
RDF_QUAD_SP unq 2 rows ()
inlined S = IRI_ID"...8c5dd2e1-7fa6-42a0-be03-024dfc57d2d1" , P = IRI_ID"...hasPart"
time 0.1% fanout 10907 input 1 rows
RDF_QUAD 1.2 rows(s_1_8_t0.O$37)
inlined P = IRI_ID"...hasPart" , S = IRI_ID"...8c5dd2e1-7fa6-42a0-be03-024dfc57d2d1"
time 27% fanout 0.999542 input 10907 rows
RDF_QUAD_SP unq 2 rows ()
inlined S = cast$152 , P = IRI_ID"...hasTopic"
time 26% fanout 3.47744 input 10902 rows
RDF_QUAD 1.2 rows(s_1_8_t1.O$44)
inlined P = IRI_ID"...hasTopic" , S = k_s_1_8_t0.O$165
time 19% fanout 1 input 37911 rows
Precode:
0: __ro2sq$47 := Call __ro2sq (s_1_8_t1.O$44)
5: BReturn 0
RDF_QUAD_SP unq 2 rows (s_1_8_t2.S$50)
inlined S = cast$178 , P = IRI_ID"...type"
time 0.91% fanout 0.535016 input 37911 rows
RDF_QUAD_POGS 0.92 rows()
inlined P = IRI_ID"...type" , O = IRI_ID"...Subject" , S = k_s_1_8_t1.O$192
time 2.9% fanout 1 input 20283 rows
RDF_QUAD_SP unq 2 rows ()
inlined S = k_s_1_8_t1.O$205 , P = IRI_ID"...label" S = k_s_1_8_t2.S$209
time 5% fanout 1 input 20283 rows
RDF_QUAD 1.2 rows(s_1_8_t3.O$60)
inlined P = IRI_ID"...label" , S = k_s_1_8_t1.O$220 S = k_s_1_8_t2.S$224
time 19% fanout 0 input 20283 rows
Sort (s_1_8_t3.O$60, s_1_8_t1.O$44) -> (inc$71, __ro2sq$47)
}
time 0.27% fanout 173 input 1 rows
group by read node
(s_1_8_t3.O$60, s_1_8_t1.O$44, aggregate$65, __ro2sq$47)
After code:
0: __ro2sq$91 := Call __ro2sq (s_1_8_t3.O$60)
5: iri$29 := := artm __ro2sq$47
9: label$30 := := artm __ro2sq$91
13: count$31 := := artm aggregate$65
17: BReturn 0
time 0.00042% fanout 0 input 173 rows
Subquery Select(iri$29, label$30, count$31)
}
After code:
0: iri$100 := Call __ro2sq (iri$29)
5: label$102 := Call __ro2sq (label$30)
10: count$104 := Call __ro2sq (count$31)
15: BReturn 0
time 0.0004% fanout 0 input 173 rows
Select (iri$100, label$102, count$104)
}
260 msec 91% cpu, 138336 rnd 37889 seq 86.4331% same seg 10.4397% same pg
Compilation: 4 msec 0 reads 0% read 0 messages 0% clw
4 Rows. -- 288 msec.
And here in scope of hasTopic
:
SELECT ?iri ?label (count(distinct ?pub) as ?count) WHERE {
<http://int.database.de/linkeddata/resources/sets/8c5dd2e1-7fa6-42a0-be03-024dfc57d2d1> dct:hasPart ?pub .
?pub fgi:hasTopic ?iri OPTION (table_option "index S").
?iri a fgi:Subject .
?iri rdfs:label ?label .
}
{
time 0.0013% fanout 1 input 1 rows
Subquery 28
{
time 0.0015% fanout 1 input 1 rows
fork {
time 0.14% fanout 10907 input 1 rows
RDF_QUAD 1.1e+04 rows(s_1_8_t0.O$34)
inlined P = IRI_ID"...hasPart" , S = IRI_ID"...8c5dd2e1-7fa6-42a0-be03-024dfc57d2d1"
time 37% fanout 0.999542 input 10907 rows
RDF_QUAD_SP unq 2 rows ()
inlined S = cast$131 , P = IRI_ID"...hasTopic"
time 36% fanout 3.47744 input 10902 rows
RDF_QUAD 1.2 rows(s_1_8_t1.O$40)
inlined P = IRI_ID"...hasTopic" , S = k_s_1_8_t0.O$144
time 2% fanout 0.535016 input 37911 rows
RDF_QUAD_POGS 0.8 rows(s_1_8_t2.S$44)
inlined P = IRI_ID"...type" , O = IRI_ID"...Subject" , S = cast$157
time 1.7% fanout 1 input 20283 rows
RDF_QUAD 1 rows(s_1_8_t3.O$47)
inlined P = IRI_ID"...label" , S = k_s_1_8_t1.O$171
time 23% fanout 0 input 20283 rows
Sort (s_1_8_t3.O$47, s_1_8_t1.O$40) -> (inc$57)
}
time 0.58% fanout 173 input 1 rows
group by read node
(s_1_8_t3.O$47, s_1_8_t1.O$40, aggregate$51)
After code:
0: __ro2sq$77 := Call __ro2sq (s_1_8_t3.O$47)
5: __ro2sq$79 := Call __ro2sq (s_1_8_t1.O$40)
10: iri$29 := := artm __ro2sq$79
14: label$30 := := artm __ro2sq$77
18: count$31 := := artm aggregate$51
22: BReturn 0
time 0.0006% fanout 0 input 173 rows
Subquery Select(iri$29, label$30, count$31)
}
After code:
0: iri$88 := Call __ro2sq (iri$29)
5: label$90 := Call __ro2sq (label$30)
10: count$92 := Call __ro2sq (count$31)
15: BReturn 0
time 0.0005% fanout 0 input 173 rows
Select (iri$88, label$90, count$92)
}
190 msec 91% cpu, 80160 rnd 37890 seq 77.2483% same seg 17.8104% same pg
Compilation: 3 msec 0 reads 0% read 0 messages 0% clw
4 Rows. -- 214 msec.
Sadly its still slower than the normal version(DEFINE select-option "order"
)
We note you have IndexTreeMaps
defined twice in your INI file, to different values:
...
IndexTreeMaps = 1024
MaxMemPoolSize = 200000000
PrefixResultNames = 0
MacSpotlight = 0
IndexTreeMaps = 64
...
We recommend it be set to its default of 256
.
Also, when you said initially that In April, I completely rebuilt it and went from a 3-year-old version to 7.2.9
, did this mean you reloaded all your RDF data into an empty database, in which case the database was newly created with the 7.2.9
binary? If the database was created with the indicated binary from 3 years ago, its RDF_IRI
prefix table would need upgrading from 32-bit to 64-bit, as detailed in Upgrading from VOS 7.2.X to VOS 7.2.7, and you should get a message indicating such in the log on startup.
Yes it was completely new created with the 7.2.9
binary.
I changed the IndexTreeMaps
setting back to its default.
In my last checkpoint from today I got some warnings:
cursor not continuable as it is unregistered
Thanks for confirming the database was rebuilt from scratch, when upgrading to the 7.2.9
binary.
Are you saying the cursor not continuable as it is unregistered
only started occurring since setting IndexTreeMaps
to the default 256 value ? Can you provide a virtuoso.log
snippet from the point of restarting Virtuoso to the occurrence of the cursor not continuable as it is unregistered
message if not long or copy of the virtuoso.log
file for review.
No it started before. Here is our log since the last start virtuoso.log
The cursor not continuable as it is unregistered
warning messages are known to occur when loading large datasets with the Virtuoso RDF bulk loader, and can be ignored.
BTW, looking at your log file snippet the Virtuoso instance was last restarted on Jul 19 2023
, thus when you updated the IndexTreeMaps
setting a few days ago, how was this done, as if the virtuoso.ini
was updated, then the Virtuoso server would have to be restarted for the new setting to take effect?
This was only a snippet. You can the 10:22:00 Server shutdown complete
at the end of the log.
There was nothing special after the restart yesterday.
We have been using Virtuoso for a publication database for several years. In April, I completely rebuilt it and went from a 3-year-old version to 7.2.9. I quickly noticed that some query optimizations were causing problems. I then simply used
DEFINE select-option "order"
.Last Monday, these queries suddenly ran extremely slowly again. Nothing helped.
Server:
DB:
Config:
First, I created an explain in ISQL for this query
<http://int.database.de/linkeddata/resources/sets/8c5dd2e1-7fa6-42a0-be03-024dfc57d2d1>
has ~11000 pubs.Without
DEFINE select-option "order"
:With
DEFINE select-option "order"
:You can see here that
?pub fgi:hasTopic ?iri
is called first without the order option. This makes no sense as?pub fgi:hasTopic ?iri
has ~250 million triple and<http://int.database.de/linkeddata/resources/sets/8c5dd2e1-7fa6-42a0-be03-024dfc57d2d1> dct:hasPart ?pub.
only ~11000.Now the same with
Explain
andDEFINE select-option "order"
:Here you can see that the
hasTopic
triple took 100% of the 100 minutes. The same query only took a maximum of 4 seconds last week.Now I have updated Virtuoso to version 7.2.10. This helped to bring the queries with the order option to the old time.
Without
DEFINE select-option "order"
, thehasTopic
triple is taken first and the query takes 60 seconds.Now I have a second query that contains a
union
. Here theorder
option is simply ignored.without Order:
with
DEFINE select-option "order"
:In both, the
hasTopic
triple is listed first and the query takes just under 3 minutes. WithoutDEFINE select-option "order"
, the query also uses 100% of all cores of the server. Image WithDEFINE select-option "order"
, it is only one core. It gets better when I make a profile withoutDEFINE select-option "order"
. Virtuoso then says goodbye with a segmentation fault. InOptional
theDEFINE select-option "order"
will also be ignored.Why is the Virtuoso Optimizer so wrong with the
hasTopic
triple and can I fix this without theDEFINE select-option "order"
? I am also surprised that before the update to version 7.2.9 last Monday the query took forever (100min) even withDEFINE select-option "order"
. Then there is the fact that the query with theUnion
withoutDEFINE select-option "order"
uses 100% of the CPU and kills Virtuoso when the query runs viaprofile
. I think this is due to the fact that it does not run in the timeout configured inVirtuoso.ini
.I hope someone can help me with this problem.
Thanks