Open pasky opened 8 years ago
@pasky: Are you able to provide a test case to demonstrate this issue ?
Well, this happens on my local instance of BaseKB Gold Ultimate. So, not something nicely self-contained. Or are you more interested in the SELECTs?
@pasky: So you download the complete dump of :BaseKB Gold Ultimate from http://basekb.com/gold/ and loaded into your own Virtuoso instance and not be instantiating the AWS AMI instance available ?
Unfortunately, I never managed to load the dump to a Virtuoso instance as it would always eat up all my memory regardless of memory limits configured. So I instantiated the AWS AMI instance and downloaded its virtuoso /var directory and started that locally.
I guess I could try starting the AWS instance again and check if the queries run multithreaded there...
@pasky: I will probably try downloading the dataset dumps as 1.2billion triples should load quite readily into a machine with about 20GB RAM I would expect ... you can try the AWS instance if u want ...
We tried loading this on a machine with 32GB RAM. After 2.5 days, it was in the half but rapidly slowing down, so we gave up. If you'd find virtuoso.ini settings with dramatically better performance, we'd be very interested! :-)
(This was backed by a magnetic drive, not SSD, though.)
@pasky: When you downloaded the :BaseKB datasets from S3 where u able to do so directly from a local machine or did you have to download to AMI first as I would like to avoid instantiating and AMI if possible ?
Hi! Unfortunately, I wasn't able to do that. What I did was starting the purchased instance just for a few minutes to get it initialized, then shutting it down, snapshotting it, attaching it to a t1.micro instance or some-such, and copying over the data from there. A bit convoluted, I guess - this is my time actually using AWS at all, though! In total, it cost me $16.
I tried to start an AWS instance again to verify if this performance problem exists with the Virtuoso in AWS as well. Unfortunately, it turns out that the instance is absolutely unusable without the database on local SSD, I/O is super-slow :( - a transfer from EBS to the local SSD will take about 6 hours, it seems. I'll be happy to give you access to this AWS instance when that's done, though.
I'm also attaching the virtuoso.ini in case you could see something obviously wrong there: virtuoso.ini.txt
I have verified that even the Virtuoso on AWS exhibits this problem.
I'll be happy to give you access to check it out, if you wish. (Just note that I'm paying a daily fee for the storage so I'd be glad if we could do this in the coming days.)
If you can provide access to the AWS AMI that probably would be best , thus how would you go about that ?
Please send me an email at pasky@ucw.cz with your ssh public key and the time I should turn it on (ideally before 22:00 UTC, but not necessarily) + maybe some IM contact (e.g. skype or whatever) so we can coordinate turning it off again.
(Also, if it's enough for you to check+investigate the instance I cloned it to outside of AWS, that's much easier for us as it's running all the time, just a matter of shutting down fuseki + starting up virtuoso. Do you have IPv6 connectivity personally?)
Hi! I'd like to ask if you could be kind enough to give an update once again before I tear down this setup for good. (I'm still incurring monthly AWS charges for keeping that storage just in case.) Thanks!
@pasky: Thought u had already shutdown the AWS instance once I downloaded the DB file back in November ...
I started the database with the INI file copied from the AMI and replayed the HTTP recordings you created and the query load was spread across multiple cpus from the top output:
top - 18:52:44 up 11 days, 3:24, 4 users, load average: 3.48, 1.32, 0.53
Tasks: 951 total, 1 running, 949 sleeping, 0 stopped, 0 zombie
%Cpu0 : 96.1 us, 0.7 sy, 0.0 ni, 2.0 id, 1.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 77.3 us, 0.0 sy, 0.0 ni, 22.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 0.0 us, 0.0 sy, 0.0 ni, 99.0 id, 1.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 22.1 us, 0.0 sy, 0.0 ni, 77.6 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 98.0 us, 0.3 sy, 0.0 ni, 0.7 id, 1.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 75.1 us, 0.3 sy, 0.0 ni, 24.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 21.6 us, 0.3 sy, 0.0 ni, 77.8 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu8 : 99.7 us, 0.0 sy, 0.0 ni, 0.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu9 : 98.7 us, 0.3 sy, 0.0 ni, 0.3 id, 0.7 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 : 99.7 us, 0.0 sy, 0.0 ni, 0.0 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu11 : 97.7 us, 0.3 sy, 0.0 ni, 2.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu12 : 0.7 us, 1.0 sy, 0.0 ni, 98.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu13 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu14 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu15 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu16 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu17 : 21.7 us, 0.3 sy, 0.0 ni, 78.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu18 : 76.7 us, 0.3 sy, 0.0 ni, 22.6 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu19 : 97.7 us, 0.3 sy, 0.0 ni, 2.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu20 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu21 : 0.3 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu22 : 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu23 : 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
This was using a latest 3215 develop/7 git repo build:
SQL> status('');
REPORT
VARCHAR
_______________________________________________________________________________
OpenLink Virtuoso Server
Version 07.20.3215-pthreads for Linux as of Dec 19 2015
Started on: 2016-01-03 18:59 GMT+1
Database Status:
File size 0, 15790336 pages, 7578357 free.
2567104 buffers, 2567102 used, 2 dirty 0 wired down, repl age 877623 0 w. io 0 w/crsr.
Disk Usage: 4408610 reads avg 0 msec, 0% r 0% w last 0 s, 957 writes flush 0 MB,
37780 read ahead, batch = 113. Autocompact 0 in 0 out, 0% saved.
Gate: 50526 2nd in reads, 0 gate write waits, 0 in while read 0 busy scrap.
Log = virtuoso.trx, 2510 bytes
8210185 pages have been changed since last backup (in checkpoint state)
Current backup timestamp: 0x0000-0x00-0x00
Last backup date: unknown
Clients: 1 connects, max 1 concurrent
RPC: 6 calls, 1 pending, 1 max until now, 0 queued, 0 burst reads (0%), 0 second 0M large, 538M max
Checkpoint Remap 1482 pages, 0 mapped back. 0 s atomic time.
DB master 15790336 total 7578357 free 1482 remap 1 mapped back
temp 256 total 251 free
Lock Status: 0 deadlocks of which 0 2r1w, 0 waits,
Currently 1 threads running 0 threads waiting 0 threads in vdb.
Pending:
23 Rows. -- 421 msec.
SQL>
Hi! I have a SELECT with a large number of UNIONs for various possible paths in the graph. Fuseki seems to be able to execute the SELECTs in parallel (but unfortunately is I/O-bound) while Virtuoso never uses more than one thread to evaluate the SELECT. This is despite the fact that I tried to configure vectorization:
MaxQueryMem = 2G ; memory allocated to query processor VectorSize = 1000 ; initial parallel query vector (array of query operations) size MaxVectorSize = 1000000 ; query vector size threshold. AdjustVectorSize = 0 ThreadsPerQuery = 9 AsyncQueueMaxThreads = 10
Is this expected?