openlink / virtuoso-opensource

Virtuoso is a high-performance and scalable Multi-Model RDBMS, Data Integration Middleware, Linked Data Deployment, and HTTP Application Server Platform
https://vos.openlinksw.com
Other
865 stars 210 forks source link

Monitor: High disk read (2) #716

Closed ebremer closed 6 years ago

ebremer commented 6 years ago

I'm bulk loading with (but with one loader) into a Virtuoso v7.2.4.2 instance. It's approximately 6 billion triples loading onto a 322GB RAM server with 32 cores. After a day of loading, below comes up on the virtuoso console output. Also, if I query select count(*) where {?s ?p ?o} on the named graph I'm uploading to, the count fluctuates wildly up and down.

09:42:51 Monitor: High disk read (2) 09:44:51 Monitor: High disk read (2) 09:46:51 Monitor: High disk read (2) 09:48:51 Monitor: High disk read (2) 09:50:51 Monitor: High disk read (2) 09:55:52 Monitor: High disk read (2) 09:58:56 Monitor: High disk read (2) 10:02:01 Monitor: High disk read (2) 10:04:27 Monitor: High disk read (2) 10:06:46 Checkpoint started 10:09:40 Checkpoint removed 44132 MB of remapped pages, leaving 15 MB. Duration 138.3 s. To save this time, increase MaxCheckpointRemap and/or set Unremap quota to 0 in ini file. 10:13:09 Checkpoint finished, log reused 10:14:15 Monitor: High disk read (2) 10:16:15 Monitor: High disk read (2) 10:18:15 Monitor: High disk read (2) 10:20:16 Monitor: High disk read (2) 10:22:16 Monitor: High disk read (2) 10:24:16 Monitor: High disk read (2) 10:26:16 Monitor: High disk read (2) 10:28:16 Monitor: High disk read (2) 10:30:25 Monitor: High disk read (2) 10:32:25 Monitor: High disk read (2) 10:34:27 Monitor: High disk read (2) 10:36:27 Monitor: High disk read (2) 10:39:09 Monitor: High disk read (2) 10:41:10 Monitor: High disk read (2) 10:43:10 Monitor: High disk read (2) 10:45:10 Monitor: High disk read (2) 10:47:10 Monitor: High disk read (2) 10:49:10 Monitor: High disk read (2) 10:51:11 Monitor: High disk read (2) 10:53:12 Monitor: High disk read (2) 10:55:12 Monitor: High disk read (2) 10:57:13 Monitor: High disk read (2) 10:59:33 Monitor: High disk read (2) 11:01:33 Monitor: High disk read (2) 11:04:19 Monitor: High disk read (2) 11:06:19 Monitor: High disk read (2) 11:08:20 Monitor: High disk read (2) 11:10:20 Monitor: High disk read (2) 11:12:20 Monitor: High disk read (2) 11:15:14 Monitor: High disk read (2) 11:17:15 Monitor: High disk read (2) 11:20:47 Monitor: High disk read (2) 11:25:50 Monitor: High disk read (2) 11:25:50 Monitor: CPU% is low while there are large numbers of runnable threads 11:29:10 Monitor: High disk read (2)

My virtuoso.ini file is as follows: [Database] DatabaseFile = /data/virtuoso/var/lib/virtuoso/db/virtuoso.db ErrorLogFile = /data/virtuoso/var/lib/virtuoso/db/virtuoso.log LockFile = /data/virtuoso/var/lib/virtuoso/db/virtuoso.lck TransactionFile = /data/virtuoso/var/lib/virtuoso/db/virtuoso.trx xa_persistent_file = /data/virtuoso/var/lib/virtuoso/db/virtuoso.pxa ErrorLogLevel = 7 FileExtend = 200 MaxCheckpointRemap = 2000 Striping = 0 TempStorage = TempDatabase

[TempDatabase] DatabaseFile = /data/virtuoso/var/lib/virtuoso/db/virtuoso-temp.db TransactionFile = /data/virtuoso/var/lib/virtuoso/db/virtuoso-temp.trx MaxCheckpointRemap = 2000 Striping = 0

; ; Server parameters ; [Parameters] ServerPort = 1111 LiteMode = 0 DisableUnixSocket = 1 DisableTcpSocket = 0 ;SSLServerPort = 2111 ;SSLCertificate = cert.pem ;SSLPrivateKey = pk.pem ;X509ClientVerify = 0 ;X509ClientVerifyDepth = 0 ;X509ClientVerifyCAFile = ca.pem MaxClientConnections = 10 CheckpointInterval = 60 O_DIRECT = 0 CaseMode = 2 MaxStaticCursorRows = 5000 CheckpointAuditTrail = 0 AllowOSCalls = 0 SchedulerInterval = 10 DirsAllowed = ., /data/virtuoso/share/virtuoso/vad ThreadCleanupInterval = 0 ThreadThreshold = 10 ResourcesCleanupInterval = 0 FreeTextBatchSize = 100000 SingleCPU = 0 VADInstallDir = /data/virtuoso/share/virtuoso/vad/ PrefixResultNames = 0 RdfFreeTextRulesSize = 100 IndexTreeMaps = 256 MaxMemPoolSize = 200000000 PrefixResultNames = 0 MacSpotlight = 0 IndexTreeMaps = 64 MaxQueryMem = 2G ; memory allocated to query processor VectorSize = 1000 ; initial parallel query vector (array of query operations) size MaxVectorSize = 1000000 ; query vector size threshold. AdjustVectorSize = 0 ThreadsPerQuery = 4 AsyncQueueMaxThreads = 10 ;; ;; When running with large data sets, one should configure the Virtuoso ;; process to use between 2/3 to 3/5 of free system memory and to stripe ;; storage on all available disks. ;; NumberOfBuffers = 18280208 MaxDirtyBuffers = 13416667

;; ;; Note the default settings will take very little memory ;; but will not result in very good performance ;; NumberOfBuffers = 10000 MaxDirtyBuffers = 6000

[HTTPServer] ServerPort = 8890 ServerRoot = /data/virtuoso/var/lib/virtuoso/vsp MaxClientConnections = 10 DavRoot = DAV EnabledDavVSP = 0 HTTPProxyEnabled = 0 TempASPXDir = 0 DefaultMailServer = localhost:25 ServerThreads = 10 MaxKeepAlives = 10 KeepAliveTimeout = 10 MaxCachedProxyConnections = 10 ProxyConnectionCacheTimeout = 15 HTTPThreadSize = 280000 HttpPrintWarningsInOutput = 0 Charset = UTF-8 ;HTTPLogFile = logs/http.log MaintenancePage = atomic.html EnabledGzipContent = 1

[AutoRepair] BadParentLinks = 0

[Client] SQL_PREFETCH_ROWS = 100 SQL_PREFETCH_BYTES = 16000 SQL_QUERY_TIMEOUT = 0 SQL_TXN_TIMEOUT = 0 ;SQL_NO_CHAR_C_ESCAPE = 1 ;SQL_UTF8_EXECS = 0 ;SQL_NO_SYSTEM_TABLES = 0 ;SQL_BINARY_TIMESTAMP = 1 ;SQL_ENCRYPTION_ON_PASSWORD = -1

[VDB] ArrayOptimization = 0 NumArrayParameters = 10 VDBDisconnectTimeout = 1000 KeepConnectionOnFixedThread = 0

[Replication] ServerName = db-ATOZ ServerEnable = 1 QueueMax = 50000

[Striping] Segment1 = 100M, db-seg1-1.db, db-seg1-2.db Segment2 = 100M, db-seg2-1.db ;...

;[TempStriping] ;Segment1 = 100M, db-seg1-1.db, db-seg1-2.db ;Segment2 = 100M, db-seg2-1.db ;...

;[Ucms] ;UcmPath = ;Ucm1 = ;Ucm2 = ;...

[Zero Config] ServerName = virtuoso (ATOZ) ;ServerDSN = ZDSN ;SSLServerName = ;SSLServerDSN =

[Mono] ;MONO_TRACE = Off ;MONO_PATH = ;MONO_ROOT = ;MONO_CFG_DIR = ;virtclr.dll =

[URIQA] DynamicLocal = 0 DefaultHost = localhost:8890

[SPARQL] ;ExternalQuerySource = 1 ;ExternalXsltSource = 1 ;DefaultGraph = http://localhost:8890/dataspace ;ImmutableGraphs = http://localhost:8890/dataspace ResultSetMaxRows = 10000 MaxQueryCostEstimationTime = 400 ; in seconds MaxQueryExecutionTime = 60 ; in seconds DefaultQuery = select distinct ?Concept where {[] a ?Concept} LIMIT 100 DeferInferenceRulesInit = 0 ; controls inference rules loading ;PingService = http://rpc.pingthesemanticweb.com/

[Plugins] LoadPath = /data/virtuoso/lib/virtuoso/hosting Load1 = plain, wikiv Load2 = plain, mediawiki Load3 = plain, creolewiki ;Load4 = plain, im ;Load5 = plain, wbxml2 ;Load6 = plain, hslookup ;Load7 = attach, libphp5.so ;Load8 = Hosting, hosting_php.so ;Load9 = Hosting,hosting_perl.so ;Load10 = Hosting,hosting_python.so ;Load11 = Hosting,hosting_ruby.so ;Load12 = msdtc,msdtc_sample

TallTed commented 6 years ago

This is likely to be a significant part, if not all, of your issue --

;; When running with large data sets, one should configure the Virtuoso
;; process to use between 2/3 to 3/5 of free system memory and to stripe
;; storage on all available disks.
;;
NumberOfBuffers = 18280208
MaxDirtyBuffers = 13416667

;;
;; Note the default settings will take very little memory
;; but will not result in very good performance
;;
NumberOfBuffers = 10000
MaxDirtyBuffers = 6000

Note that the last two lines over-ride the preceding values for these keywords. You must comment out the last two.

ebremer commented 6 years ago

I'm an idiot. I took the time to calculate the 2/3 rule for those two parameters and I forgot to comment out the defaults. Thanks Ted, I'm re-running the upload now.

ebremer commented 6 years ago

That worked. 6,204,086,892 tripled loaded. There still were some: 16:56:15 Monitor: High disk read (2) 16:56:15 Monitor: CPU% is low while there are large numbers of runnable threads

in the console log, but it looks like it is working much better. Thanks!