Closed ebremer closed 6 years ago
This is likely to be a significant part, if not all, of your issue --
;; When running with large data sets, one should configure the Virtuoso
;; process to use between 2/3 to 3/5 of free system memory and to stripe
;; storage on all available disks.
;;
NumberOfBuffers = 18280208
MaxDirtyBuffers = 13416667
;;
;; Note the default settings will take very little memory
;; but will not result in very good performance
;;
NumberOfBuffers = 10000
MaxDirtyBuffers = 6000
Note that the last two lines over-ride the preceding values for these keywords. You must comment out the last two.
I'm an idiot. I took the time to calculate the 2/3 rule for those two parameters and I forgot to comment out the defaults. Thanks Ted, I'm re-running the upload now.
That worked. 6,204,086,892 tripled loaded. There still were some: 16:56:15 Monitor: High disk read (2) 16:56:15 Monitor: CPU% is low while there are large numbers of runnable threads
in the console log, but it looks like it is working much better. Thanks!
I'm bulk loading with (but with one loader) into a Virtuoso v7.2.4.2 instance. It's approximately 6 billion triples loading onto a 322GB RAM server with 32 cores. After a day of loading, below comes up on the virtuoso console output. Also, if I query select count(*) where {?s ?p ?o} on the named graph I'm uploading to, the count fluctuates wildly up and down.
09:42:51 Monitor: High disk read (2) 09:44:51 Monitor: High disk read (2) 09:46:51 Monitor: High disk read (2) 09:48:51 Monitor: High disk read (2) 09:50:51 Monitor: High disk read (2) 09:55:52 Monitor: High disk read (2) 09:58:56 Monitor: High disk read (2) 10:02:01 Monitor: High disk read (2) 10:04:27 Monitor: High disk read (2) 10:06:46 Checkpoint started 10:09:40 Checkpoint removed 44132 MB of remapped pages, leaving 15 MB. Duration 138.3 s. To save this time, increase MaxCheckpointRemap and/or set Unremap quota to 0 in ini file. 10:13:09 Checkpoint finished, log reused 10:14:15 Monitor: High disk read (2) 10:16:15 Monitor: High disk read (2) 10:18:15 Monitor: High disk read (2) 10:20:16 Monitor: High disk read (2) 10:22:16 Monitor: High disk read (2) 10:24:16 Monitor: High disk read (2) 10:26:16 Monitor: High disk read (2) 10:28:16 Monitor: High disk read (2) 10:30:25 Monitor: High disk read (2) 10:32:25 Monitor: High disk read (2) 10:34:27 Monitor: High disk read (2) 10:36:27 Monitor: High disk read (2) 10:39:09 Monitor: High disk read (2) 10:41:10 Monitor: High disk read (2) 10:43:10 Monitor: High disk read (2) 10:45:10 Monitor: High disk read (2) 10:47:10 Monitor: High disk read (2) 10:49:10 Monitor: High disk read (2) 10:51:11 Monitor: High disk read (2) 10:53:12 Monitor: High disk read (2) 10:55:12 Monitor: High disk read (2) 10:57:13 Monitor: High disk read (2) 10:59:33 Monitor: High disk read (2) 11:01:33 Monitor: High disk read (2) 11:04:19 Monitor: High disk read (2) 11:06:19 Monitor: High disk read (2) 11:08:20 Monitor: High disk read (2) 11:10:20 Monitor: High disk read (2) 11:12:20 Monitor: High disk read (2) 11:15:14 Monitor: High disk read (2) 11:17:15 Monitor: High disk read (2) 11:20:47 Monitor: High disk read (2) 11:25:50 Monitor: High disk read (2) 11:25:50 Monitor: CPU% is low while there are large numbers of runnable threads 11:29:10 Monitor: High disk read (2)
My virtuoso.ini file is as follows: [Database] DatabaseFile = /data/virtuoso/var/lib/virtuoso/db/virtuoso.db ErrorLogFile = /data/virtuoso/var/lib/virtuoso/db/virtuoso.log LockFile = /data/virtuoso/var/lib/virtuoso/db/virtuoso.lck TransactionFile = /data/virtuoso/var/lib/virtuoso/db/virtuoso.trx xa_persistent_file = /data/virtuoso/var/lib/virtuoso/db/virtuoso.pxa ErrorLogLevel = 7 FileExtend = 200 MaxCheckpointRemap = 2000 Striping = 0 TempStorage = TempDatabase
[TempDatabase] DatabaseFile = /data/virtuoso/var/lib/virtuoso/db/virtuoso-temp.db TransactionFile = /data/virtuoso/var/lib/virtuoso/db/virtuoso-temp.trx MaxCheckpointRemap = 2000 Striping = 0
; ; Server parameters ; [Parameters] ServerPort = 1111 LiteMode = 0 DisableUnixSocket = 1 DisableTcpSocket = 0 ;SSLServerPort = 2111 ;SSLCertificate = cert.pem ;SSLPrivateKey = pk.pem ;X509ClientVerify = 0 ;X509ClientVerifyDepth = 0 ;X509ClientVerifyCAFile = ca.pem MaxClientConnections = 10 CheckpointInterval = 60 O_DIRECT = 0 CaseMode = 2 MaxStaticCursorRows = 5000 CheckpointAuditTrail = 0 AllowOSCalls = 0 SchedulerInterval = 10 DirsAllowed = ., /data/virtuoso/share/virtuoso/vad ThreadCleanupInterval = 0 ThreadThreshold = 10 ResourcesCleanupInterval = 0 FreeTextBatchSize = 100000 SingleCPU = 0 VADInstallDir = /data/virtuoso/share/virtuoso/vad/ PrefixResultNames = 0 RdfFreeTextRulesSize = 100 IndexTreeMaps = 256 MaxMemPoolSize = 200000000 PrefixResultNames = 0 MacSpotlight = 0 IndexTreeMaps = 64 MaxQueryMem = 2G ; memory allocated to query processor VectorSize = 1000 ; initial parallel query vector (array of query operations) size MaxVectorSize = 1000000 ; query vector size threshold. AdjustVectorSize = 0 ThreadsPerQuery = 4 AsyncQueueMaxThreads = 10 ;; ;; When running with large data sets, one should configure the Virtuoso ;; process to use between 2/3 to 3/5 of free system memory and to stripe ;; storage on all available disks. ;; NumberOfBuffers = 18280208 MaxDirtyBuffers = 13416667
;; ;; Note the default settings will take very little memory ;; but will not result in very good performance ;; NumberOfBuffers = 10000 MaxDirtyBuffers = 6000
[HTTPServer] ServerPort = 8890 ServerRoot = /data/virtuoso/var/lib/virtuoso/vsp MaxClientConnections = 10 DavRoot = DAV EnabledDavVSP = 0 HTTPProxyEnabled = 0 TempASPXDir = 0 DefaultMailServer = localhost:25 ServerThreads = 10 MaxKeepAlives = 10 KeepAliveTimeout = 10 MaxCachedProxyConnections = 10 ProxyConnectionCacheTimeout = 15 HTTPThreadSize = 280000 HttpPrintWarningsInOutput = 0 Charset = UTF-8 ;HTTPLogFile = logs/http.log MaintenancePage = atomic.html EnabledGzipContent = 1
[AutoRepair] BadParentLinks = 0
[Client] SQL_PREFETCH_ROWS = 100 SQL_PREFETCH_BYTES = 16000 SQL_QUERY_TIMEOUT = 0 SQL_TXN_TIMEOUT = 0 ;SQL_NO_CHAR_C_ESCAPE = 1 ;SQL_UTF8_EXECS = 0 ;SQL_NO_SYSTEM_TABLES = 0 ;SQL_BINARY_TIMESTAMP = 1 ;SQL_ENCRYPTION_ON_PASSWORD = -1
[VDB] ArrayOptimization = 0 NumArrayParameters = 10 VDBDisconnectTimeout = 1000 KeepConnectionOnFixedThread = 0
[Replication] ServerName = db-ATOZ ServerEnable = 1 QueueMax = 50000
[Striping] Segment1 = 100M, db-seg1-1.db, db-seg1-2.db Segment2 = 100M, db-seg2-1.db ;...
;[TempStriping] ;Segment1 = 100M, db-seg1-1.db, db-seg1-2.db ;Segment2 = 100M, db-seg2-1.db ;...
;[Ucms] ;UcmPath =
;Ucm1 =
;Ucm2 =
;...
[Zero Config] ServerName = virtuoso (ATOZ) ;ServerDSN = ZDSN ;SSLServerName = ;SSLServerDSN =
[Mono] ;MONO_TRACE = Off ;MONO_PATH =
;MONO_ROOT =
;MONO_CFG_DIR =
;virtclr.dll =
[URIQA] DynamicLocal = 0 DefaultHost = localhost:8890
[SPARQL] ;ExternalQuerySource = 1 ;ExternalXsltSource = 1 ;DefaultGraph = http://localhost:8890/dataspace ;ImmutableGraphs = http://localhost:8890/dataspace ResultSetMaxRows = 10000 MaxQueryCostEstimationTime = 400 ; in seconds MaxQueryExecutionTime = 60 ; in seconds DefaultQuery = select distinct ?Concept where {[] a ?Concept} LIMIT 100 DeferInferenceRulesInit = 0 ; controls inference rules loading ;PingService = http://rpc.pingthesemanticweb.com/
[Plugins] LoadPath = /data/virtuoso/lib/virtuoso/hosting Load1 = plain, wikiv Load2 = plain, mediawiki Load3 = plain, creolewiki ;Load4 = plain, im ;Load5 = plain, wbxml2 ;Load6 = plain, hslookup ;Load7 = attach, libphp5.so ;Load8 = Hosting, hosting_php.so ;Load9 = Hosting,hosting_perl.so ;Load10 = Hosting,hosting_python.so ;Load11 = Hosting,hosting_ruby.so ;Load12 = msdtc,msdtc_sample