openlink / virtuoso-opensource

Virtuoso is a high-performance and scalable Multi-Model RDBMS, Data Integration Middleware, Linked Data Deployment, and HTTP Application Server Platform
http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/
Other
848 stars 214 forks source link

Virtuoso crash on SPARQL query #675

Open ffritsche opened 6 years ago

ffritsche commented 6 years ago

We have a very large rdf db (virtuoso.db is 332GB large) and 400GB RAM. Our problem is that virtuoso crashes on some querys without errors. Is there a way to get more debug information on this problem or is there a solution?

Linux: Redhat 3.10.0-514.16.1.el7.x86_64

Start with screen: ./virtuoso-t -dfc /mnt/ssd/database/virtuoso.ini

Query: select (count(?x) as ?count) from where{ ?x a ns:example FILTER NOT EXISTS{ ?a ?b ?x } }

In this graph are 14995770 ns:example. We need this query to delete all orphan triples of this class.

Log: Fri Sep 22 2017 14:53:44 INFO: { Loading plugin 1: Typeplain', file wikiv' in/usr/local/lib/virtuoso/hosting' 14:53:44 ERROR: FAILED plugin 1: Unable to locate file } 14:53:44 INFO: { Loading plugin 2: Type plain', filemediawiki' in /usr/local/lib/virtuoso/hosting' 14:53:44 ERROR: FAILED plugin 2: Unable to locate file } 14:53:44 INFO: { Loading plugin 3: Typeplain', file creolewiki' in/usr/local/lib/virtuoso/hosting' 14:53:44 ERROR: FAILED plugin 3: Unable to locate file } 14:53:44 INFO: OpenLink Virtuoso Universal Server 14:53:44 INFO: Version 07.20.3217-pthreads for Linux as of Sep 22 2017 14:53:44 INFO: uses parts of OpenSSL, PCRE, Html Tidy 14:54:35 INFO: Database version 3126 14:54:36 INFO: SQL Optimizer enabled (max 1000 layouts) 14:54:37 INFO: Compiler unit is timed at 0.000132 msec 14:54:52 INFO: Roll forward started 14:54:52 INFO: 58 transactions, 11100 bytes replayed (100 %) 14:54:52 INFO: Roll forward complete 14:54:59 INFO: PL LOG: Installing Virtuoso Conductor version 1.00.8768 (DAV) 14:54:59 INFO: PL LOG: Installing with dependencies Virtuoso Conductor version 1.00.8768/2017-09-22 12:47 (DAV) 14:54:59 INFO: PL LOG: VAD_INSTALL: VAD file checksum mismatch (42VAD) 14:54:59 INFO: PL LOG: Errors were detected during installation of "/usr/local/virtuoso-opensource/share/virtuoso/vad/conductor_dav.vad". 14:55:01 INFO: HTTP/WebDAV server online at 8890 14:55:01 INFO: Server online at 1111 (pid 1871) Killed`

Config: ; ; virtuoso.ini ; ; Configuration file for the OpenLink Virtuoso VDBMS Server ; ; To learn more about this product, or any other product in our ; portfolio, please check out our web site at: ; ; http://virtuoso.openlinksw.com/ ; ; or contact us at: ; ; general.information@openlinksw.com ; ; If you have any technical questions, please contact our support ; staff at: ; ; technical.support@openlinksw.com ; ; ; Database setup ; [Database] DatabaseFile = /mnt/ssd/database/virtuoso.db ErrorLogFile = /mnt/ssd/database/virtuoso.log LockFile = /mnt/ssd/database/virtuoso.lck TransactionFile = /mnt/ssd/database/virtuoso20170717094327.trx xa_persistent_file = /mnt/ssd/database/virtuoso.pxa ErrorLogLevel = 7 FileExtend = 200 MaxCheckpointRemap = 11100000 Striping = 0 TempStorage = TempDatabase TransactionAfterImageLimit = 5000000000

[TempDatabase] DatabaseFile = /mnt/ssd/database/virtuoso-temp.db TransactionFile = /mnt/ssd/database/virtuoso-temp.trx MaxCheckpointRemap = 2000 Striping = 0

; ; Server parameters ; [Parameters] TransactionAfterImageLimit = 5000000000 ServerThreads = 100 ServerPort = 1111 LiteMode = 0 DisableUnixSocket = 1 DisableTcpSocket = 0 ;SSLServerPort = 2111 ;SSLCertificate = cert.pem ;SSLPrivateKey = pk.pem ;X509ClientVerify = 0 ;X509ClientVerifyDepth = 0 ;X509ClientVerifyCAFile = ca.pem MaxClientConnections = 100 CheckpointInterval = -1 ; no auto checkpoint CheckpointAuditTrail = 0 ; don't throw away transaction logs after checkpointing O_DIRECT = 1 CaseMode = 2 MaxStaticCursorRows = 5000 AllowOSCalls = 0 SchedulerInterval = 10 DirsAllowed = ., /usr/local/virtuoso-opensource/share/virtuoso/vad, /mnt/ssd/ ThreadCleanupInterval = 0 ThreadThreshold = 10 ResourcesCleanupInterval = 0 FreeTextBatchSize = 100000 SingleCPU = 0 VADInstallDir = /usr/local/virtuoso-opensource/share/virtuoso/vad PrefixResultNames = 0 RdfFreeTextRulesSize = 100 IndexTreeMaps = 1024 MaxMemPoolSize = 200000000 PrefixResultNames = 0 MacSpotlight = 0 MaxQueryMem = 20G ; memory allocated to query processor VectorSize = 10000 ; initial parallel query vector (array of query operations) size MaxVectorSize = 5000000 ; query vector size threshold. AdjustVectorSize = 1 ThreadsPerQuery = 20 AsyncQueueMaxThreads = 20 ;; ;; When running with large data sets, one should configure the Virtuoso ;; process to use between 2/3 to 3/5 of free system memory and to stripe ;; storage on all available disks. ;; NumberOfBuffers = 32300000 MaxDirtyBuffers = 24700000

[HTTPServer] ServerPort = 8890 ServerRoot = /usr/local/virtuoso-opensource/var/lib/virtuoso/vsp MaxClientConnections = 100 DavRoot = DAV EnabledDavVSP = 0 HTTPProxyEnabled = 0 TempASPXDir = 0 DefaultMailServer = localhost:25 ServerThreads = 28 MaxKeepAlives = 10 KeepAliveTimeout = 10 MaxCachedProxyConnections = 10 ProxyConnectionCacheTimeout = 15 HTTPThreadSize = 280000 HttpPrintWarningsInOutput = 0 Charset = UTF-8 ;HTTPLogFile = logs/http.log MaintenancePage = atomic.html EnabledGzipContent = 1

[AutoRepair] BadParentLinks = 0

[Client] SQL_PREFETCH_ROWS = 100 SQL_PREFETCH_BYTES = 16000 SQL_QUERY_TIMEOUT = 0 SQL_TXN_TIMEOUT = 0 ;SQL_NO_CHAR_C_ESCAPE = 1 ;SQL_UTF8_EXECS = 0 ;SQL_NO_SYSTEM_TABLES = 0 ;SQL_BINARY_TIMESTAMP = 1 ;SQL_ENCRYPTION_ON_PASSWORD = -1

[VDB] ArrayOptimization = 0 NumArrayParameters = 10 VDBDisconnectTimeout = 1000 KeepConnectionOnFixedThread = 0

[Replication] ServerName = db-90C88C41CDAA ServerEnable = 1 QueueMax = 50000

; ; Striping setup ; ; These parameters have only effect when Striping is set to 1 in the ; [Database] section, in which case the DatabaseFile parameter is ignored. ; ; With striping, the database is spawned across multiple segments ; where each segment can have multiple stripes. ; ; Format of the lines below: ; Segment = , [, .. ] ; ; must be ordered from 1 up. ; ; The is the total size of the segment which is equally divided ; across all stripes forming the segment. Its specification can be in ; gigabytes (g), megabytes (m), kilobytes (k) or in database blocks ; (b, the default) ; ; Note that the segment size must be a multiple of the database page size ; which is currently 8k. Also, the segment size must be divisible by the ; number of stripe files forming the segment. ; ; The example below creates a 200 meg database striped on two segments ; with two stripes of 50 meg and one of 100 meg. ; ; You can always add more segments to the configuration, but once ; added, do not change the setup. ; [Striping] Segment1 = 100M, db-seg1-1.db, db-seg1-2.db Segment2 = 100M, db-seg2-1.db ;... ;[TempStriping] ;Segment1 = 100M, db-seg1-1.db, db-seg1-2.db ;Segment2 = 100M, db-seg2-1.db ;... ;[Ucms] ;UcmPath = ;Ucm1 = ;Ucm2 = ;...

[Zero Config] ServerName = virtuoso (90C88C41CDAA) ;ServerDSN = ZDSN ;SSLServerName = ;SSLServerDSN =

[Mono] ;MONO_TRACE = Off ;MONO_PATH = ;MONO_ROOT = ;MONO_CFG_DIR = ;virtclr.dll =

[URIQA] DynamicLocal = 0 DefaultHost = localhost:8890

[SPARQL] ;ExternalQuerySource = 1 ;ExternalXsltSource = 1 ;DefaultGraph = http://localhost:8890/dataspace ;ImmutableGraphs = http://localhost:8890/dataspace ResultSetMaxRows = 1000000000 MaxQueryCostEstimationTime = 400000 ; in seconds MaxQueryExecutionTime = 15000 ; in seconds DefaultQuery = select distinct ?Concept where {[] a ?Concept} LIMIT 100 DeferInferenceRulesInit = 0 ; controls inference rules loading ;PingService = http://rpc.pingthesemanticweb.com/

[Plugins] LoadPath = /usr/local/lib/virtuoso/hosting Load1 = plain, wikiv Load2 = plain, mediawiki Load3 = plain, creolewiki ;Load4 = plain, im ;Load5 = plain, wbxml2 ;Load6 = plain, hslookup ;Load7 = attach, libphp5.so ;Load8 = Hosting, hosting_php.so ;Load9 = Hosting,hosting_perl.so ;Load10 = Hosting,hosting_python.so ;Load11 = Hosting,hosting_ruby.so ;Load12 = msdtc,msdtc_sample

HughWilliams commented 6 years ago

Use something like the following via SPASQL (extending SQL using SPARQL, for ACID exploitation):

SPARQL
SELECT * FROM (
                SPARQL
                DEFINE output:format "TTL" 

                DELETE { GRAPH <graph-iri>  {basic-graph-pattern} }
                WHERE
                {
                           {basic-graph-pattern} 
                }
) AS {some-alias-you-choose}  FOR UPDATE ;

Set your preferred ACID modality via log_enable [1].

[1] http://docs.openlinksw.com/virtuoso/fn_log_enable/

mybyte commented 6 years ago

@HughWilliams what exactly will this accomplish? Not sure if it'll work as a workaround in first place. In any case, a database should never crash to desktop without any sort of error and a simple count query like stated above should work in any case. I can see how this might be an issue when deleting large numbers of triples, but a simple count?

kidehen commented 6 years ago

@mybyte ,

If you are performing an Insert, Update, or Delete operation on any DBMS the is the notion of a concept called ACID (Atomicity, Consistency, Isolation, and Durability) that comes into play with all transactions.

What you are not aware of right now is the fact that unlike SQL, SPARQL has no semantics for handling ACID. That said, Virtuoso (which is a multi-model RDBMS at its core) does provide this functionality to SPARQL if executed as an extension to SQL (what we call SPASQL, as per @HughWilliams example template).

ACID consumes system resources, and performing the exercise suggest is a shortcut to better understanding what's going on re. your massive DELETE exercise.

mybyte commented 6 years ago

@kidehen,

In the code above, there's no mention of a DELETE whatsoever. The author talks about deleting orphan elements, but the issue at hand is not the deletion, but the select query that leads to a virtuoso crash.

Query: select (count(?x) as ?count) from example:graph where{ ?x a ns:example FILTER NOT EXISTS{ ?a ?b ?x } }

This shouldn't fail and crash a database.

kidehen commented 6 years ago

@mybyte ,

it states "Query: select (count(?x) as ?count) from example:graph where{ ?x a ns:example FILTER NOT EXISTS{ ?a ?b ?x } }

In this graph are 14995770 ns:example. We need this query to delete all orphan triples of this class."

Or where you intending to say: "We need the query solution to exclude all relations for which ?x identifies a relation object?"

if so, I've tested the query at: http://linkeddata.uriburner.com/sparql?default-graph-uri=&query=select+%28count%28%3Fx%29+as+%3Fcount%29+%0D%0A%23+from+example%3Agraph+%0D%0Awhere%7B%0D%0A%3Fx+a+foaf%3APerson%0D%0AFILTER+NOT+EXISTS%7B+%3Fa+%3Fb+%3Fx+%7D%0D%0A%7D&should-sponge=&format=text%2Fhtml&CXML_redir_for_subjs=121&CXML_redir_for_hrefs=&timeout=30000000 -- its will take a while due to nature of the query.

Triple size for instance: SQL> sparql select count(*) where {?s ?p ?o} ; callret-0 INTEGER


401,565,662

1 Rows. -- 277 msec.

This is a Virtuoso 8.0 instance.

HughWilliams commented 6 years ago

@mybyte: The confusion stems from one of your initial comments in this graph are 14995770 ns:example. We need this query to delete all orphan triples of this class. , which is what the DELETE query is a response to ...

Regarding the crash:

  1. Starting Virtuoso with "ulimit -c unlimited" set beforehand, when the crash occurs is core file file process ? If, so can a gdb stack trace of the core file be provided

  2. Are any errors reported in the system logs (i.e. /var/log/messages etc) regarding the reason the Virtuoso server was killed (crashed) ?

  3. Have u tried build from the latest git develop/7 branch i.e. https://github.com/openlink/virtuoso-opensource as there have been a number of major updates applied to this branch over the past week.

ffritsche commented 6 years ago

Thanks for your response, our goal is to delete all orthan triples. Unfortunately virtuoso crashes on the selection part of the query and not an the transactions: select ?x from where{ ?x a ns:example FILTER NOT EXISTS{ ?a ?b ?x } }

log_enable was set to 1. We used the last develop/7 brauch.

Maybe the MaxVectorSize/VectorSize pref are to high. We will make more tests