openlink / virtuoso-opensource

Virtuoso is a high-performance and scalable Multi-Model RDBMS, Data Integration Middleware, Linked Data Deployment, and HTTP Application Server Platform
https://vos.openlinksw.com
Other
852 stars 211 forks source link

Multiple Failed Queries with S1TAT Error and HY008 Error SR203 #983

Open NathanHerr opened 2 years ago

NathanHerr commented 2 years ago

Hi,

I am running multiple complex queries to a private server hosting VOS with FreeBase loaded.

I have set the timeout parameter (sparql.addExtraURITag("timeout", "20000")) but I am still receiving the following timeout error:

Virtuoso S1TAT Error RC...: Returning incomplete results, query interrupted by result timeout.

The following error, I do not know the origin of and would appreciate any help figuring out these issues.

Virtuoso HY008 Error SR203: Async statement killed by SQLCancel.

NOTE: When I test the queries which fail with this error, they fail on a timeout (when the anytime execution timeout is not set) but works perfectly when the timeout is set. This makes me this that I am not setting the timeout correctly in my python code.

Lastly, I receive this error. Again, I do not know the origin of and would appreciate any help figuring out these issues.

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">\n<html>\n <head>\n <title>Error HTTP/1.1 404 File not found</title>\n </head>\n <body>\n <h3>Error HTTP/1.1 404 File not found</h3><pre>\nThe requested URL was not found URI = \'/sparql\'\n </pre></body></html>\n

HughWilliams commented 2 years ago

Please try setting sparql.addExtraURITag("timeout", "0") to disable timeouts, to see if this resolves the Virtuoso S1TAT Error RC...: Returning incomplete results, query interrupted by result timeout errors.

This should also resolve the Virtuoso HY008 Error SR203: Async statement killed by SQLCancel errors, as they are probably occurring as a result of timeouts and Virtuoso attempting to roll back transactions.

What is the context in which the error —

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">\n<html>\n <head>\n <title>Error HTTP/1.1 404 File not found</title>\n </head>\n <body>\n <h3>Error HTTP/1.1 404 File not found</h3><pre>\nThe requested URL was not found URI = \'/sparql\'\n </pre></body></html>\n

— is occurring as it doesn't seem like a Virtuoso specific error, other than the reference to /sparql, which cannot be found for some reason, but I presume your queries are normally being run against the Virtuoso SPARQL endpoint ~/sparql?

NathanHerr commented 2 years ago

Hi @HughWilliams and @TallTed,

Thanks for your response.

I will try setting the timeout to 0, although I don't want there to no timeout. Some of the queries which I am running are quite complex and will take too long if left unchecked, which is why I opted to set the anytime timeout. Will setting the timeout to 0 indicate what the possible causes of the current errors may be?

For the last error, the context is unclear, as the query failed was not reported for some reason. I do run all my queries again a SPARQL endpoint. If this is an error which you are not familiar with, I will continue to investigate on my side. All

HughWilliams commented 2 years ago

Yes, enabling anytime should be a viable approach to take to enable the return of at least partial results, but we are seeking as a test to disable anytime to let the queries run to completion to determine if they do then return results ?

Regarding the other HTTP/1.1 404 File not found error I see there is another ongoing report of intermittent occurrences of this error in gitissue#299, is this similar to what you are seeing ?

NathanHerr commented 2 years ago

@HughWilliams I will let you know what the results are of setting the timeout to 0 and will get back to you about the results.

As for gitissue#299 - this seems likely to be the same issue! I have read through the entire thread, is there solution to this yet?

NathanHerr commented 2 years ago

@HughWilliams I have some more information. After seeing the following query:

PREFIX ns:<http://rdf.freebase.com/ns/> SELECT * FROM <http://freebase.com> WHERE {ns:m.094w0s ?r1 ?e1.?e1 ?r2 ?e2.?e2 ?r3 ?e3.?e3 ?r4 ns:m.09v1_r.}.

result in a

Virtuoso S1TAT Error RC...: Returning incomplete results, query interrupted by result timeout.

error I ran the exact same query (on its own this time - not apart of a larger query set) from the client side again. This time, the query resulted in the expected result.

I then tried setting the timeout to 0 and running the entire query set. This resulted in the following:

Virtuoso S1T00 Error SR171: Transaction timed out\n\nSPARQL query:\nPREFIX ns:<http://rdf.freebase.com/ns/> SELECT * FROM <http://freebase.com> WHERE {ns:m.0dhmm9 ?r1 ?e1.?e1 ?r2 ?e2.?e2 ?r3 ?e3.?e3 ?r4 ns:m.0111gwg0.}

I hope this helps with debugging the issues.

HughWilliams commented 2 years ago

What is the version of the Virtuoso server being used, which can be obtained with the command virtuoso-t -??

What does the output of running the status(); command run from isql report?

Please run the following command to check the database integrity and any possible corruption —

backup '/dev/null';

— and the following commands to check the Primary Key and Indexes on the RDF_QUAD table —

select count (s), count (p ), count (o ), count (g ) from rdf_quad table option (index rdf_quad, check);

select count (*) from rdf_quad a table option (index rdf_quad) where not exists (select 1 from rdf_quad b table option (loop, index rdf_quad_pogs) where a.g = b.g and a.p = b.p and a.o = b.o and a.s = b.s);

select count (*) from rdf_quad a table option (index rdf_quad_pogs) where not exists (select 1 from rdf_quad b table option (loop, index primary key) where a.g = b.g and a.p = b.p and a.o = b.o and a.s = b.s);

select count (*) from rdf_quad a table option (index rdf_quad_pogs) where not exists (select 1 from rdf_quad b table option (loop, index rdf_quad_op) where a.g = b.g and a.p = b.p and a.o = b.o and a.s = b.s);

select count (*) from rdf_quad a table option (index rdf_quad_pogs) where not exists (select 1 from rdf_quad b table option (loop, index rdf_quad_sp) where a.g = b.g and a.p = b.p and a.o = b.o and a.s = b.s);

select count (*) from rdf_quad a table option (index rdf_quad_pogs) where not exists (select 1 from rdf_quad b table option (loop, index rdf_quad_gs) where a.g = b.g and a.p = b.p and a.o = b.o and a.s = b.s);

The RDF Metadata can also be checked and cleaned with the command: —

DB.DBA.RDF_AUDIT_METADATA (2,'*') ; 
NathanHerr commented 2 years ago

The version is:

Virtuoso Open Source Edition (Column Store) (multi threaded)
Version 7.2.6.3233-pthreads as of Jun 28 2021 (000000)
Compiled for Linux (x86_64-pc-linux-gnu)
Copyright (C) 1998-2021 OpenLink Software

Status:

Connected to OpenLink Virtuoso
Driver: 07.20.3233 OpenLink Virtuoso ODBC Driver
REPORT
VARCHAR
_______________________________________________________________________________

OpenLink Virtuoso  Server
Version 07.20.3233-pthreads for Linux as of Jun 28 2021 
Started on: 2021-09-08 11:01 GMT+0

Database Status:
  File size 3493855232, 5669376 pages, 2303234 free.
  10560000 buffers, 2389786 used, 3 dirty 0 wired down, repl age 0 0 w. io 0 w/crsr.
  Disk Usage: 2394744 reads avg 0 msec, 0% r 0% w last  0 s, 128571 writes flush      3.014 MB/s,
    10780 read ahead, batch = 219.  Autocompact 0 in 0 out, 0% saved.
Gate:  16714 2nd in reads, 0 gate write waits, 0 in while read 0 busy scrap. 
Log = /media/data/fb/freebase_pruned_db/virtuoso.trx, 6599 bytes
3365985 pages have been changed since last backup (in checkpoint state)
Current backup timestamp: 0x0000-0x00-0x00
Last backup date: unknown
Clients: 1 connects, max 1 concurrent
RPC: 12 calls, -61 pending, 1 max until now, 0 queued, 0 burst reads (0%), 0 second 0M large, 109M max
Checkpoint Remap 0 pages, 0 mapped back. 431 s atomic time.
    DB master 5669376 total 2303234 free 0 remap 0 mapped back
   temp  256 total 251 free

Lock Status: 0 deadlocks of which 0 2r1w, 0 waits,
   Currently 1 threads running 0 threads waiting 0 threads in vdb.
Pending:

Client 1111:5:  Account: dba, 203 bytes in, 286 bytes out, 1 stmts.
PID: 891066, OS: unix, Application: unknown, IP#: 127.0.0.1
Transaction status: PENDING, 1 threads.
Locks: 

Running Statements:
 Time (msec) Text
         216 status()

Hash indexes

37 Rows. -- 217 msec.

I ran the command backup '/dev/null'; and will run the rest of the commands once it has completed.

Thank you again for all your help.

NathanHerr commented 2 years ago

@HughWilliams here are the results of the commands:

SQL> backup '/dev/null';

Done. -- 1381497 msec.
SQL> select count (s), count (p ), count (o ), count (g ) from rdf_quad table option (index rdf_quad, check);

count    count__1  count__2  count__3
INTEGER  INTEGER  INTEGER  INTEGER
_______________________________________________________________________________

954538807  954538807  954538807  954538807

1 Rows. -- 7355 msec.
SQL> SQL> select count (*) from rdf_quad a table option (index rdf_quad) where not exists (select 1 from rdf_quad b table option (loop, index rdf_quad_pogs) where a.g = b.g and a.p = b.p and a.o = b.o and a.s = b.s);

count
INTEGER
_______________________________________________________________________________

0

1 Rows. -- 468027 msec.
SQL> SQL> select count (*) from rdf_quad a table option (index rdf_quad_pogs) where not exists (select 1 from rdf_quad b table option (loop, index primary key) where a.g = b.g and a.p = b.p and a.o = b.o and a.s = b.s);

count
INTEGER
_______________________________________________________________________________

0

1 Rows. -- 261168 msec.
SQL> SQL> select count (*) from rdf_quad a table option (index rdf_quad_pogs) where not exists (select 1 from rdf_quad b table option (loop, index rdf_quad_op) where a.g = b.g and a.p = b.p and a.o = b.o and a.s = b.s);

count
INTEGER
_______________________________________________________________________________

0

1 Rows. -- 297412 msec.

SQL> select count (*) from rdf_quad a table option (index rdf_quad_pogs) where not exists (select 1 from rdf_quad b table option (loop, index rdf_quad_sp) where a.g = b.g and a.p = b.p and a.o = b.o and a.s = b.s);

count
INTEGER
_______________________________________________________________________________

0

1 Rows. -- 500712 msec.

SQL> SQL> select count (*) from rdf_quad a table option (index rdf_quad_pogs) where not exists (select 1 from rdf_quad b table option (loop, index rdf_quad_gs) where a.g = b.g and a.p = b.p and a.o = b.o and a.s = b.s);

count
INTEGER
_______________________________________________________________________________

0

1 Rows. -- 395851 msec.

SQL> SQL> DB.DBA.RDF_AUDIT_METADATA (2,'*') ;
STAT     MSG
VARCHAR  VARCHAR
_______________________________________________________________________________

00000    Reloading built-in metadata, this might fix some errors without accurate reporting that they did exist
00000    Built-in metadata were reloaded (2479 triples)
00000    Metadata from system graph are cached in memory-resident JSOs (JavaScript Objects)

3 Rows. -- 766 msec.
NathanHerr commented 2 years ago

@HughWilliams some additional information:

@HughWilliams I have some more information. After seeing the following query:

PREFIX ns:<http://rdf.freebase.com/ns/> SELECT * FROM <http://freebase.com> WHERE {ns:m.094w0s ?r1 ?e1.?e1 ?r2 ?e2.?e2 ?r3 ?e3.?e3 ?r4 ns:m.09v1_r.}.

result in a

Virtuoso S1TAT Error RC...: Returning incomplete results, query interrupted by result timeout.

error I ran the exact same query (on its own this time - not apart of a larger query set) from the client side again. This time, the query resulted in the expected result.

I then tried setting the timeout to 0 and running the entire query set. This resulted in the following:

Virtuoso S1T00 Error SR171: Transaction timed out\n\nSPARQL query:\nPREFIX ns:<http://rdf.freebase.com/ns/> SELECT * FROM <http://freebase.com> WHERE {ns:m.0dhmm9 ?r1 ?e1.?e1 ?r2 ?e2.?e2 ?r3 ?e3.?e3 ?r4 ns:m.0111gwg0.}

I hope this helps with debugging the issues.

After testing the following query with a timeout = 0

PREFIX ns:<http://rdf.freebase.com/ns/> SELECT * FROM <http://freebase.com> WHERE {ns:m.0dhmm9 ?r1 ?e1.?e1 ?r2 ?e2.?e2 ?r3 ?e3.?e3 ?r4 ns:m.0111gwg0.}

the following error is received Virtuoso S1T00 Error SR171: Transaction timed out. However, when I set the timeout=10000 or 50000 or 60000 or 100000, the anytime functionality seems to work as expected.

I hope this helps in figuring out what is happening.

NathanHerr commented 2 years ago

@HughWilliams any news on this yet?

HughWilliams commented 2 years ago

@NathanHerr

When the "entire query set" is in the process of being run, or ideally just before the "Virtuoso S1T00 Error SR171: Transaction timed out" error, can you provide the output of running the status(); command, to see state of the database at that point?

Please also provide copies of the virtuoso.ini and virtuoso.log files for review.

NathanHerr commented 2 years ago

virtuoso.ini

;
;  virtuoso.ini
;
;  Configuration file for the OpenLink Virtuoso VDBMS Server
;
;  To learn more about this product, or any other product in our
;  portfolio, please check out our web site at:
;
;      http://virtuoso.openlinksw.com/
;
;  or contact us at:
;
;      general.information@openlinksw.com
;
;  If you have any technical questions, please contact our support
;  staff at:
;
;      technical.support@openlinksw.com
;

;
;  Database setup
;
[Database]
DatabaseFile            = /virtuoso/db/directory/virtuoso.db
ErrorLogFile            = /virtuoso/db/directory/virtuoso.log
LockFile            = /virtuoso/db/directory/virtuoso.lck
TransactionFile         = /virtuoso/db/directory/virtuoso.trx
xa_persistent_file      = /virtuoso/db/directory/virtuoso.pxa
ErrorLogLevel           = 7
FileExtend          = 200
MaxCheckpointRemap      = 2000
Striping            = 0
TempStorage         = TempDatabase

[TempDatabase]
DatabaseFile            = /virtuoso/db/directory/virtuoso-temp.db
TransactionFile         = /virtuoso/db/directory/virtuoso-temp.trx
MaxCheckpointRemap      = 2000
Striping            = 0

;
;  Server parameters
;
[Parameters]
ServerPort          = 1111
LiteMode            = 0
DisableUnixSocket       = 1
DisableTcpSocket        = 0
;SSLServerPort          = 2111
;SSLCertificate         = cert.pem
;SSLPrivateKey          = pk.pem
;X509ClientVerify       = 0
;X509ClientVerifyDepth      = 0
;X509ClientVerifyCAFile     = ca.pem
MaxClientConnections        = 10
CheckpointInterval      = 60
O_DIRECT            = 0
CaseMode            = 2
MaxStaticCursorRows     = 5000
CheckpointAuditTrail        = 0
AllowOSCalls            = 0
SchedulerInterval       = 10
DirsAllowed         = ., /virtuoso-opensource/directory/virtuoso-opensource/share/virtuoso/vad, /another/dir/allowed/proj, /other/dir/allowed/fb
ThreadCleanupInterval       = 0
ThreadThreshold         = 10
ResourcesCleanupInterval    = 0
FreeTextBatchSize       = 100000
SingleCPU           = 0
VADInstallDir           = /virtuoso-opensource/directory/virtuoso-opensource/share/virtuoso/vad/
PrefixResultNames               = 0
RdfFreeTextRulesSize        = 100
IndexTreeMaps           = 256
MaxMemPoolSize                  = 200000000
PrefixResultNames               = 0
MacSpotlight                    = 0
IndexTreeMaps                   = 64
MaxQueryMem             = 2G        ; memory allocated to query processor
VectorSize          = 1000      ; initial parallel query vector (array of query operations) size
MaxVectorSize           = 1000000   ; query vector size threshold.
AdjustVectorSize        = 0
ThreadsPerQuery         = 4
AsyncQueueMaxThreads        = 10
;;
;; When running with large data sets, one should configure the Virtuoso
;; process to use between 2/3 to 3/5 of free system memory and to stripe
;; storage on all available disks.
;;
;; Uncomment next two lines if there is 2 GB system memory free
;NumberOfBuffers          = 170000
;MaxDirtyBuffers          = 130000
;; Uncomment next two lines if there is 4 GB system memory free
;NumberOfBuffers          = 340000
; MaxDirtyBuffers          = 250000
;; Uncomment next two lines if there is 8 GB system memory free
;NumberOfBuffers          = 680000
;MaxDirtyBuffers          = 500000
;; Uncomment next two lines if there is 16 GB system memory free
;NumberOfBuffers          = 1360000
;MaxDirtyBuffers          = 1000000
;; Uncomment next two lines if there is 32 GB system memory free
;NumberOfBuffers          = 2720000
;MaxDirtyBuffers          = 2000000
;; Uncomment next two lines if there is 48 GB system memory free
;NumberOfBuffers          = 4000000
;MaxDirtyBuffers          = 3000000
;; Uncomment next two lines if there is 64 GB system memory free
;NumberOfBuffers          = 5450000
;MaxDirtyBuffers          = 4000000
;;
;; Note the default settings will take very little memory
;; but will not result in very good performance
;;
NumberOfBuffers          = 10560000
MaxDirtyBuffers          = 7920000

[HTTPServer]
ServerPort          = 8890
ServerRoot          = /virtuoso-opensource/directory/virtuoso-opensource/var/lib/virtuoso/vsp
MaxClientConnections        = 10
DavRoot             = DAV
EnabledDavVSP           = 0
HTTPProxyEnabled        = 0
TempASPXDir         = 0
DefaultMailServer       = localhost:25
ServerThreads           = 10
MaxKeepAlives           = 10
KeepAliveTimeout        = 10
MaxCachedProxyConnections   = 10
ProxyConnectionCacheTimeout = 15
HTTPThreadSize          = 280000
HttpPrintWarningsInOutput   = 0
Charset             = UTF-8
;HTTPLogFile                = logs/http.log
MaintenancePage                 = atomic.html
EnabledGzipContent              = 1

[AutoRepair]
BadParentLinks          = 0

[Client]
SQL_PREFETCH_ROWS       = 100
SQL_PREFETCH_BYTES      = 16000
SQL_QUERY_TIMEOUT       = 0
SQL_TXN_TIMEOUT         = 0
;SQL_NO_CHAR_C_ESCAPE       = 1
;SQL_UTF8_EXECS         = 0
;SQL_NO_SYSTEM_TABLES       = 0
;SQL_BINARY_TIMESTAMP       = 1
;SQL_ENCRYPTION_ON_PASSWORD = -1

[VDB]
ArrayOptimization       = 0
NumArrayParameters      = 10
VDBDisconnectTimeout        = 1000
KeepConnectionOnFixedThread = 0

[Replication]
ServerName          = db-FOC-CCS-2936-BE-VOS
ServerEnable            = 1
QueueMax            = 50000

;
;  Striping setup
;
;  These parameters have only effect when Striping is set to 1 in the
;  [Database] section, in which case the DatabaseFile parameter is ignored.
;
;  With striping, the database is spawned across multiple segments
;  where each segment can have multiple stripes.
;
;  Format of the lines below:
;    Segment<number> = <size>, <stripe file name> [, <stripe file name> .. ]
;
;  <number> must be ordered from 1 up.
;
;  The <size> is the total size of the segment which is equally divided
;  across all stripes forming  the segment. Its specification can be in
;  gigabytes (g), megabytes (m), kilobytes (k) or in database blocks
;  (b, the default)
;
;  Note that the segment size must be a multiple of the database page size
;  which is currently 8k. Also, the segment size must be divisible by the
;  number of stripe files forming  the segment.
;
;  The example below creates a 200 meg database striped on two segments
;  with two stripes of 50 meg and one of 100 meg.
;
;  You can always add more segments to the configuration, but once
;  added, do not change the setup.
;
[Striping]
Segment1            = 100M, db-seg1-1.db, db-seg1-2.db
Segment2            = 100M, db-seg2-1.db
;...

;[TempStriping]
;Segment1           = 100M, db-seg1-1.db, db-seg1-2.db
;Segment2           = 100M, db-seg2-1.db
;...

;[Ucms]
;UcmPath            = <path>
;Ucm1               = <file>
;Ucm2               = <file>
;...

[Zero Config]
ServerName          = virtuoso (FOC-CCS-2936-BE-VOS)
;ServerDSN          = ZDSN
;SSLServerName          = 
;SSLServerDSN           = 

[Mono]
;MONO_TRACE         = Off
;MONO_PATH          = <path_here>
;MONO_ROOT          = <path_here>
;MONO_CFG_DIR           = <path_here>
;virtclr.dll            =

[URIQA]
DynamicLocal            = 0
DefaultHost         = localhost:8890

[SPARQL]
;ExternalQuerySource        = 1
;ExternalXsltSource         = 1
;DefaultGraph           = http://localhost:8890/dataspace
;ImmutableGraphs            = http://localhost:8890/dataspace
ResultSetMaxRows            = 10000
MaxQueryCostEstimationTime  = 400   ; in seconds
MaxQueryExecutionTime       = 60    ; in seconds
DefaultQuery                = select distinct ?Concept where {[] a ?Concept} LIMIT 100
DeferInferenceRulesInit     = 0  ; controls inference rules loading
;PingService            = http://rpc.pingthesemanticweb.com/

[Plugins]
LoadPath            = /virtuoso-opensource/directory/virtuoso-opensource/lib/virtuoso/hosting
Load1               = plain, wikiv
Load2               = plain, mediawiki
Load3               = plain, creolewiki
;Load4          = plain, im
;Load5      = plain, wbxml2
;Load6          = attach, libphp5.so
;Load7          = Hosting, hosting_php.so
;Load8      = plain, proj4
;Load9          = plain, geos
Load10      = plain, shapefileio
;Load11         = plain, hslookup
;Load12     = Hosting,hosting_perl.so
;Load13     = Hosting,hosting_python.so
;Load14     = Hosting,hosting_ruby.so
;Load15             = msdtc,msdtc_sample
NathanHerr commented 2 years ago

virtuoso.log

NathanHerr commented 2 years ago

With regards to running status(); while the queries are running — I will attempt to get this for you in the meantime. Thanks.