openlink / virtuoso-opensource

Virtuoso is a high-performance and scalable Multi-Model RDBMS, Data Integration Middleware, Linked Data Deployment, and HTTP Application Server Platform
https://vos.openlinksw.com
Other
863 stars 210 forks source link

Erratic behavior: Multiple repeated rows for single object #1013

Closed davidshumway closed 2 years ago

davidshumway commented 2 years ago

Version: 07.20.3233 Build: Jun 22 2021 (Via docker)

Initially loading data with the bulk loader. But after seeing this issue, noticed that the same issue is appearing in the conductor quad store upload as well. Following the bulk loading guide here: (http://vos.openlinksw.com/owiki/wiki/VOS/VirtBulkRDFLoader).

The first three objects inserted show up correctly in a sparql select query. But the following objects (a million or so) appear to each be repeated when similarly queried.

In total, the imported data contains roughly 1,050,000 objects.

Counting the objects:

select count(*) as ?s where {
    ?obs a sosa:Observation ;
      sosa:hasFeatureOfInterest ?foi .
} group by ?obs
order by asc(?s)

Result (using pandas to generate the .value_count():

results per observation | count of observations with this # of results
1    690,006
2    358,570

So this is saying that rather than returning a single unique entity, in many cases two results are returned which are identical. As if two inserts were made to the database. Whereas normally I would expect this to return a single observation entity, that occurs in 2/3 of the cases while in 1/3 of the cases two identical entities are instead returned.

Querying for more relationships, the results again vary further:

select count(*) as ?s where {
    ?obs a sosa:Observation ;
      sosa:hasFeatureOfInterest ?foi ;
      sosa:resultTime ?ddate ;
      sosa:observedProperty ?prop ;
      sosa:hasResult [
        qudt-1-1:numericValue ?value ;
        qudt-1-1:unit ?unit ] .
} group by ?beachObs
order by asc(?s)
results per observation | count of observations with this # of results
16  1,008,078
8   21,040
4   19,454
1   4

For example, ex:obs9 a sosa:Observation appears once in the the raw n3 data but shows up 16 times after import. Sample .n3 data:

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns>
prefix xsd: <http://www.w3.org/2001/XMLSchema#> 
prefix sosa: <http://www.w3.org/ns/sosa/> 
prefix ex: <http://www.example.org/> 
prefix qudt-1-1: <http://qudt.org/1.1/schema/qudt#>
prefix qudt-unit-1-1: <http://qudt.org/1.1/vocab/unit#>

ex:obs9 a sosa:Observation ;
sosa:hasFeatureOfInterest ex:foi1 ;
sosa:resultTime "2020-01-01T00:00:00Z"^^xsd:datetime ;
sosa:observedProperty "Air Temperature"^^xsd:string ;
sosa:hasResult [ 
  qudt-1-1:numericValue "20.0"^^xsd:double ;
  qudt-1-1:unit "C"^^xsd:string ] .
Example result (in most cases repeated more than once): obs foi date prop value unit
http://www.example.org/obs9 http://www.example.org/foi1 "2020-01-01T00:00:00Z"^^http://www.w3.org/2001/XMLSchema#datetime "Air Temperature"^^http://www.w3.org/2001/XMLSchema#string 20.0 "C"^^http://www.w3.org/2001/XMLSchema#string

What's also odd is that using the conductor quad store upload, occasionally one new object is added, but there are other times when varying multiple rows are added.

Any ideas?

HughWilliams commented 2 years ago

Can you provide the output of running the select * from load_list; SQL query run from "isql" to see if the bulk load operations encountered any issues.

What does the output of the following query return as the list of graphs and triple counts for each:

SPARQL SELECT ?graph (count(?s) as ?count) WHERE { GRAPH ?graph { ?s ?p ?o } } GROUP BY ?graph ORDER BY DESC(?count);

Please also provide a copy of your virtuoso.ini and virtuoso.log files for review.

davidshumway commented 2 years ago
SQL> select * from load_list;
ll_file                                                                           ll_graph                                                                          ll_state    ll_started           ll_done              ll_host     ll_work_time  ll_error
VARCHAR NOT NULL                                                                  VARCHAR                                                                           INTEGER     TIMESTAMP            TIMESTAMP            INTEGER     INTEGER     VARCHAR
_______________________________________________________________________________

/database/B00-foi.n3                                                           http://www.example.org                                                            2           2022.3.18 4:48.5 102402000  2022.3.18 4:48.5 223977000  0           NULL        NULL
/database/B00-geom.n3                                                          http://www.example.org                                                            2           2022.3.18 4:48.5 102402000  2022.3.18 4:48.5 337611000  0           NULL        NULL
/database/B00-obs.n3                                                           http://www.example.org                                                            2           2022.3.18 4:48.5 102402000  2022.3.18 4:49.6 840831000  0           NULL        NULL
/database/E-foi.n3                                                             http://www.example.org                                                            2           2022.3.18 4:49.6 841163000  2022.3.18 4:49.24 303053000  0           NULL        NULL
/database/E-geom.n3                                                            http://www.example.org                                                            2           2022.3.18 4:49.24 303307000  2022.3.18 4:49.40 972979000  0           NULL        NULL
/database/E-obs.n3                                                             http://www.example.org                                                            2           2022.3.18 4:49.40 973431000  2022.3.18 4:50.4 362837000  0           NULL        NULL

6 Rows. -- 2 msec.
SQL> SPARQL SELECT ?graph (count(?s) as ?count) WHERE { GRAPH ?graph { ?s ?p ?o } } GROUP BY ?graph ORDER BY DESC(?count);
graph                                                                             count
LONG VARCHAR                                                                      LONG VARCHAR
_______________________________________________________________________________

http://www.example.org                                                            14526177
http://localhost:8890/DAV                                                         8627487
http://localhost:8890/DAV/                                                        2983
http://www.openlinksw.com/schemas/virtrdf#                                        2479
http://www.w3.org/2002/07/owl#                                                    160
http://example.org                                                                23
http://localhost:8890/sparql                                                      14
http://www.w3.org/ns/ldp#                                                         3
urn:activitystreams-owl:map                                                       2

9 Rows. -- 332 msec.
SQL> 
$ cat virtuoso.ini
;
;  virtuoso.ini
;
;  Configuration file for the OpenLink Virtuoso VDBMS Server
;
;  To learn more about this product, or any other product in our
;  portfolio, please check out our web site at:
;
;      http://virtuoso.openlinksw.com/
;
;  or contact us at:
;
;      general.information@openlinksw.com
;
;  If you have any technical questions, please contact our support
;  staff at:
;
;      technical.support@openlinksw.com
;
;
;  Database setup
;
[Database]
DatabaseFile       = virtuoso.db
ErrorLogFile       = virtuoso.log
LockFile           = virtuoso.lck
TransactionFile    = virtuoso.trx
xa_persistent_file = virtuoso.pxa
ErrorLogLevel      = 7
FileExtend         = 200
MaxCheckpointRemap = 2000
Striping           = 0
TempStorage        = TempDatabase

[TempDatabase]
DatabaseFile       = virtuoso-temp.db
TransactionFile    = virtuoso-temp.trx
MaxCheckpointRemap = 2000
Striping           = 0

;
;  Server parameters
;
[Parameters]
ServerPort               = 1111
LiteMode                 = 0
DisableUnixSocket        = 1
DisableTcpSocket         = 0
;SSLServerPort          = 2111
;SSLCertificate         = cert.pem
;SSLPrivateKey          = pk.pem
;X509ClientVerify       = 0
;X509ClientVerifyDepth      = 0
;X509ClientVerifyCAFile     = ca.pem
MaxClientConnections     = 10
CheckpointInterval       = 60
O_DIRECT                 = 0
CaseMode                 = 2
MaxStaticCursorRows      = 5000
CheckpointAuditTrail     = 0
AllowOSCalls             = 0
SchedulerInterval        = 10
DirsAllowed              = ., ../vad, /usr/share/proj, /database
ThreadCleanupInterval    = 0
ThreadThreshold          = 10
ResourcesCleanupInterval = 0
FreeTextBatchSize        = 100000
SingleCPU                = 0
VADInstallDir            = ../vad/
PrefixResultNames        = 0
RdfFreeTextRulesSize     = 100
IndexTreeMaps            = 256
MaxMemPoolSize           = 200000000
PrefixResultNames        = 0
MacSpotlight             = 0
IndexTreeMaps            = 64
MaxQueryMem              = 2G   ; memory allocated to query processor
VectorSize               = 1000 ; initial parallel query vector (array of query operations) size
MaxVectorSize            = 1000000  ; query vector size threshold.
AdjustVectorSize         = 0
ThreadsPerQuery          = 4
AsyncQueueMaxThreads     = 10
;;
;; When running with large data sets, one should configure the Virtuoso
;; process to use between 2/3 to 3/5 of free system memory and to stripe
;; storage on all available disks.
;;
;; Uncomment next two lines if there is 2 GB system memory free
;NumberOfBuffers          = 170000
;MaxDirtyBuffers          = 130000
;; Uncomment next two lines if there is 4 GB system memory free
;NumberOfBuffers          = 340000
; MaxDirtyBuffers          = 250000
;; Uncomment next two lines if there is 8 GB system memory free
;NumberOfBuffers          = 680000
;MaxDirtyBuffers          = 500000
;; Uncomment next two lines if there is 16 GB system memory free
;NumberOfBuffers          = 1360000
;MaxDirtyBuffers          = 1000000
;; Uncomment next two lines if there is 32 GB system memory free
;NumberOfBuffers          = 2720000
;MaxDirtyBuffers          = 2000000
;; Uncomment next two lines if there is 48 GB system memory free
;NumberOfBuffers          = 4000000
;MaxDirtyBuffers          = 3000000
;; Uncomment next two lines if there is 64 GB system memory free
;NumberOfBuffers          = 5450000
;MaxDirtyBuffers          = 4000000
;;
;; Note the default settings will take very little memory
;; but will not result in very good performance
;;
NumberOfBuffers          = 10000
MaxDirtyBuffers          = 6000

[HTTPServer]
ServerPort                  = 8890
ServerRoot                  = ../vsp
MaxClientConnections        = 10
DavRoot                     = DAV
EnabledDavVSP               = 0
HTTPProxyEnabled            = 0
TempASPXDir                 = 0
DefaultMailServer           = localhost:25
ServerThreads               = 10
MaxKeepAlives               = 10
KeepAliveTimeout            = 10
MaxCachedProxyConnections   = 10
ProxyConnectionCacheTimeout = 15
HTTPThreadSize              = 280000
HttpPrintWarningsInOutput   = 0
Charset                     = UTF-8
;HTTPLogFile                = logs/http.log
MaintenancePage             = atomic.html
EnabledGzipContent          = 1

[AutoRepair]
BadParentLinks = 0

[Client]
SQL_PREFETCH_ROWS  = 100
SQL_PREFETCH_BYTES = 16000
SQL_QUERY_TIMEOUT  = 0
SQL_TXN_TIMEOUT    = 0
;SQL_NO_CHAR_C_ESCAPE       = 1
;SQL_UTF8_EXECS         = 0
;SQL_NO_SYSTEM_TABLES       = 0
;SQL_BINARY_TIMESTAMP       = 1
;SQL_ENCRYPTION_ON_PASSWORD = -1

[VDB]
ArrayOptimization           = 0
NumArrayParameters          = 10
VDBDisconnectTimeout        = 1000
KeepConnectionOnFixedThread = 0

[Replication]
ServerName   = db-CENTOS5-PORT
ServerEnable = 1
QueueMax     = 50000

;
;  Striping setup
;
;  These parameters have only effect when Striping is set to 1 in the
;  [Database] section, in which case the DatabaseFile parameter is ignored.
;
;  With striping, the database is spawned across multiple segments
;  where each segment can have multiple stripes.
;
;  Format of the lines below:
;    Segment<number> = <size>, <stripe file name> [, <stripe file name> .. ]
;
;  <number> must be ordered from 1 up.
;
;  The <size> is the total size of the segment which is equally divided
;  across all stripes forming  the segment. Its specification can be in
;  gigabytes (g), megabytes (m), kilobytes (k) or in database blocks
;  (b, the default)
;
;  Note that the segment size must be a multiple of the database page size
;  which is currently 8k. Also, the segment size must be divisible by the
;  number of stripe files forming  the segment.
;
;  The example below creates a 200 meg database striped on two segments
;  with two stripes of 50 meg and one of 100 meg.
;
;  You can always add more segments to the configuration, but once
;  added, do not change the setup.
;
[Striping]
Segment1 = 100M, db-seg1-1.db, db-seg1-2.db
Segment2 = 100M, db-seg2-1.db
;...
;[TempStriping]
;Segment1           = 100M, db-seg1-1.db, db-seg1-2.db
;Segment2           = 100M, db-seg2-1.db
;...
;[Ucms]
;UcmPath            = <path>
;Ucm1               = <file>
;Ucm2               = <file>
;...

[Zero Config]
ServerName = virtuoso (CENTOS5-PORT)
;ServerDSN          = ZDSN
;SSLServerName          =
;SSLServerDSN           =

[Mono]
;MONO_TRACE         = Off
;MONO_PATH          = <path_here>
;MONO_ROOT          = <path_here>
;MONO_CFG_DIR           = <path_here>
;virtclr.dll            =

[URIQA]
DynamicLocal = 0
DefaultHost  = localhost:8890

[SPARQL]
;ExternalQuerySource        = 1
;ExternalXsltSource         = 1
;DefaultGraph           = http://localhost:8890/dataspace
;ImmutableGraphs            = http://localhost:8890/dataspace
ResultSetMaxRows           = 1000000000000
MaxQueryCostEstimationTime = 4000   ; in seconds
MaxQueryExecutionTime      = 600    ; in seconds
DefaultQuery               = select distinct ?Concept where {[] a ?Concept} LIMIT 100
DeferInferenceRulesInit    = 0  ; controls inference rules loading
;PingService            = http://rpc.pingthesemanticweb.com/

[Plugins]
LoadPath = ../hosting
Load1    = plain, geos
Load2    = plain, proj4
Load3    = plain, shapefileio
$ cat virtuoso.log

        Wed Mar 16 2022
21:34:58 { Loading plugin 1: Type `plain', file `geos' in `../hosting'
21:34:58   plain version 1.2.3233 from OpenLink Software
21:34:58   GEOS plugin based on Geometry Engine Open Source library from Open Source Geospatial Foundation
21:34:58   SUCCESS plugin 1: loaded from ../hosting/geos.so }
21:34:58 { Loading plugin 2: Type `plain', file `proj4' in `../hosting'
21:34:58   plain version 1.1.3233 from OpenLink Software
21:34:58   Cartographic Projections support based on Frank Warmerdam's proj4 library
21:34:58   SUCCESS plugin 2: loaded from ../hosting/proj4.so }
21:34:58 { Loading plugin 3: Type `plain', file `shapefileio' in `../hosting'
21:34:58   ShapefileIO version 0.1virt71 from OpenLink Software
21:34:58   Shapefile support based on Frank Warmerdam's Shapelib
21:34:58   SUCCESS plugin 3: loaded from ../hosting/shapefileio.so }
21:34:58 OpenLink Virtuoso Universal Server
21:34:58 Version 07.20.3233-pthreads for Linux as of Jun 22 2021
21:34:58 uses OpenSSL 1.0.2u  20 Dec 2019
21:34:58 uses parts of PCRE, Html Tidy
21:34:58 SQL Optimizer enabled (max 1000 layouts)
21:34:59 Compiler unit is timed at 0.000472 msec
21:35:02 Checkpoint started
21:35:02 Roll forward started
21:35:02 Roll forward complete
21:35:02 Checkpoint started
21:35:02 Checkpoint finished, log reused
21:35:02 Checkpoint started
21:35:03 Checkpoint finished, log reused
21:35:03 Checkpoint started
21:35:03 Checkpoint finished, log reused
21:35:04 Checkpoint started
21:35:04 Checkpoint finished, log reused
21:35:04 Checkpoint started
21:35:05 Checkpoint finished, log reused
21:35:05 PL LOG: Installing Virtuoso Conductor version 1.00.8823 (DAV)
21:35:05 PL LOG: Installing with dependencies Virtuoso Conductor version 1.00.8823/2021-06-22 12:38 (DAV)
21:35:05 Checkpoint started
21:35:05 Checkpoint finished, log reused
21:35:08 Checkpoint started
21:35:08 Checkpoint finished, log reused
21:35:08 PL LOG: Installation with dependencies complete
21:35:08 PL LOG: Initializing DB.DBA.SYS_PROJ4_SRIDS
21:35:08 PL LOG: ... checking for data files in "/usr/share/proj"
21:35:09 PL LOG: DB.DBA.SYS_PROJ4_SRIDS now contains 8650 spatial reference systems
21:35:09 Checkpoint started
21:35:09 Checkpoint finished, log reused
21:35:09 Server exiting

        Wed Mar 16 2022
21:35:09 { Loading plugin 1: Type `plain', file `geos' in `../hosting'
21:35:09   plain version 1.2.3233 from OpenLink Software
21:35:09   GEOS plugin based on Geometry Engine Open Source library from Open Source Geospatial Foundation
21:35:09   SUCCESS plugin 1: loaded from ../hosting/geos.so }
21:35:09 { Loading plugin 2: Type `plain', file `proj4' in `../hosting'
21:35:09   plain version 1.1.3233 from OpenLink Software
21:35:09   Cartographic Projections support based on Frank Warmerdam's proj4 library
21:35:09   SUCCESS plugin 2: loaded from ../hosting/proj4.so }
21:35:09 { Loading plugin 3: Type `plain', file `shapefileio' in `../hosting'
21:35:09   ShapefileIO version 0.1virt71 from OpenLink Software
21:35:09   Shapefile support based on Frank Warmerdam's Shapelib
21:35:09   SUCCESS plugin 3: loaded from ../hosting/shapefileio.so }
21:35:09 OpenLink Virtuoso Universal Server
21:35:09 Version 07.20.3233-pthreads for Linux as of Jun 22 2021
21:35:09 uses OpenSSL 1.0.2u  20 Dec 2019
21:35:09 uses parts of PCRE, Html Tidy
21:35:09 Starting for DBA password change.
21:35:09 Database version 3126
21:35:09 SQL Optimizer enabled (max 1000 layouts)
21:35:11 Compiler unit is timed at 0.000165 msec
21:35:11 Roll forward started
21:35:11 Roll forward complete
21:35:11 The DBA password is changed.
21:35:11 The DAV password is changed.
21:35:12 Checkpoint started
21:35:12 Checkpoint finished, log reused
21:35:12 Server exiting

        Wed Mar 16 2022
21:35:12 { Loading plugin 1: Type `plain', file `geos' in `../hosting'
21:35:12   plain version 1.2.3233 from OpenLink Software
21:35:12   GEOS plugin based on Geometry Engine Open Source library from Open Source Geospatial Foundation
21:35:12   SUCCESS plugin 1: loaded from ../hosting/geos.so }
21:35:12 { Loading plugin 2: Type `plain', file `proj4' in `../hosting'
21:35:12   plain version 1.1.3233 from OpenLink Software
21:35:12   Cartographic Projections support based on Frank Warmerdam's proj4 library
21:35:12   SUCCESS plugin 2: loaded from ../hosting/proj4.so }
21:35:12 { Loading plugin 3: Type `plain', file `shapefileio' in `../hosting'
21:35:12   ShapefileIO version 0.1virt71 from OpenLink Software
21:35:12   Shapefile support based on Frank Warmerdam's Shapelib
21:35:12   SUCCESS plugin 3: loaded from ../hosting/shapefileio.so }
21:35:12 OpenLink Virtuoso Universal Server
21:35:12 Version 07.20.3233-pthreads for Linux as of Jun 22 2021
21:35:12 uses OpenSSL 1.0.2u  20 Dec 2019
21:35:12 uses parts of PCRE, Html Tidy
21:35:12 Database version 3126
21:35:12 SQL Optimizer enabled (max 1000 layouts)
21:35:13 Compiler unit is timed at 0.000172 msec
21:35:14 Roll forward started
21:35:14 Roll forward complete
21:35:15 Checkpoint started
21:35:15 Checkpoint finished, log reused
21:35:17 HTTP/WebDAV server online at 8890
21:35:17 Server online at 1111 (pid 1)
21:35:17 Incorrect login for dba from IP [127.0.0.1]
21:45:05 Incorrect login for dba from IP [127.0.0.1]
21:45:20 Server received signal 2
21:45:23 Initiating normal shutdown
21:45:23 Checkpoint started
21:45:23 Checkpoint finished, log reused
21:45:23 Server shutdown complete

        Wed Mar 16 2022
21:46:43 { Loading plugin 1: Type `plain', file `geos' in `../hosting'
21:46:43   plain version 1.2.3233 from OpenLink Software
21:46:43   GEOS plugin based on Geometry Engine Open Source library from Open Source Geospatial Foundation
21:46:43   SUCCESS plugin 1: loaded from ../hosting/geos.so }
21:46:43 { Loading plugin 2: Type `plain', file `proj4' in `../hosting'
21:46:43   plain version 1.1.3233 from OpenLink Software
21:46:43   Cartographic Projections support based on Frank Warmerdam's proj4 library
21:46:43   SUCCESS plugin 2: loaded from ../hosting/proj4.so }
21:46:43 { Loading plugin 3: Type `plain', file `shapefileio' in `../hosting'
21:46:43   ShapefileIO version 0.1virt71 from OpenLink Software
21:46:43   Shapefile support based on Frank Warmerdam's Shapelib
21:46:43   SUCCESS plugin 3: loaded from ../hosting/shapefileio.so }
21:46:43 OpenLink Virtuoso Universal Server
21:46:43 Version 07.20.3233-pthreads for Linux as of Jun 22 2021
21:46:43 uses OpenSSL 1.0.2u  20 Dec 2019
21:46:43 uses parts of PCRE, Html Tidy
21:46:43 Database version 3126
21:46:43 SQL Optimizer enabled (max 1000 layouts)
21:46:44 Compiler unit is timed at 0.000193 msec
21:46:45 Roll forward started
21:46:45 Roll forward complete
21:46:46 Checkpoint started
21:46:46 Checkpoint finished, log reused
21:46:48 HTTP/WebDAV server online at 8890
21:46:48 Server online at 1111 (pid 1)
21:46:48 Incorrect login for dba from IP [127.0.0.1]
21:52:16 PL LOG: Loader started
21:53:57 PL LOG: No more files to load. Loader has finished,
21:54:25 Checkpoint started
21:54:25 Checkpoint finished, log reused

        Thu Mar 17 2022
00:55:58 Server received signal 2
00:55:58 Initiating normal shutdown
00:55:58 Checkpoint started
00:55:59 Checkpoint finished, log reused
00:55:59 Server shutdown complete

        Thu Mar 17 2022
00:56:04 { Loading plugin 1: Type `plain', file `geos' in `../hosting'
00:56:04   plain version 1.2.3233 from OpenLink Software
00:56:04   GEOS plugin based on Geometry Engine Open Source library from Open Source Geospatial Foundation
00:56:04   SUCCESS plugin 1: loaded from ../hosting/geos.so }
00:56:04 { Loading plugin 2: Type `plain', file `proj4' in `../hosting'
00:56:04   plain version 1.1.3233 from OpenLink Software
00:56:04   Cartographic Projections support based on Frank Warmerdam's proj4 library
00:56:04   SUCCESS plugin 2: loaded from ../hosting/proj4.so }
00:56:04 { Loading plugin 3: Type `plain', file `shapefileio' in `../hosting'
00:56:04   ShapefileIO version 0.1virt71 from OpenLink Software
00:56:04   Shapefile support based on Frank Warmerdam's Shapelib
00:56:04   SUCCESS plugin 3: loaded from ../hosting/shapefileio.so }
00:56:04 OpenLink Virtuoso Universal Server
00:56:04 Version 07.20.3233-pthreads for Linux as of Jun 22 2021
00:56:04 uses OpenSSL 1.0.2u  20 Dec 2019
00:56:04 uses parts of PCRE, Html Tidy
00:56:04 Database version 3126
00:56:04 SQL Optimizer enabled (max 1000 layouts)
00:56:05 Compiler unit is timed at 0.000166 msec
00:56:07 Roll forward started
00:56:07 Roll forward complete
00:56:08 Checkpoint started
00:56:08 Checkpoint finished, log reused
00:56:11 HTTP/WebDAV server online at 8890
00:56:11 Server online at 1111 (pid 1)
01:17:30 Server received signal 2
01:17:30 Initiating normal shutdown
01:17:30 Checkpoint started
01:17:30 Checkpoint finished, log reused
01:17:30 Server shutdown complete

        Thu Mar 17 2022
01:17:31 { Loading plugin 1: Type `plain', file `geos' in `../hosting'
01:17:31   plain version 1.2.3233 from OpenLink Software
01:17:31   GEOS plugin based on Geometry Engine Open Source library from Open Source Geospatial Foundation
01:17:31   SUCCESS plugin 1: loaded from ../hosting/geos.so }
01:17:31 { Loading plugin 2: Type `plain', file `proj4' in `../hosting'
01:17:31   plain version 1.1.3233 from OpenLink Software
01:17:31   Cartographic Projections support based on Frank Warmerdam's proj4 library
01:17:31   SUCCESS plugin 2: loaded from ../hosting/proj4.so }
01:17:31 { Loading plugin 3: Type `plain', file `shapefileio' in `../hosting'
01:17:31   ShapefileIO version 0.1virt71 from OpenLink Software
01:17:31   Shapefile support based on Frank Warmerdam's Shapelib
01:17:31   SUCCESS plugin 3: loaded from ../hosting/shapefileio.so }
01:17:31 OpenLink Virtuoso Universal Server
01:17:31 Version 07.20.3233-pthreads for Linux as of Jun 22 2021
01:17:31 uses OpenSSL 1.0.2u  20 Dec 2019
01:17:31 uses parts of PCRE, Html Tidy
01:17:31 Database version 3126
01:17:31 SQL Optimizer enabled (max 1000 layouts)
01:17:32 Compiler unit is timed at 0.000170 msec
01:17:34 Roll forward started
01:17:34 Roll forward complete
01:17:35 Checkpoint started
01:17:36 Checkpoint finished, log reused
01:17:38 HTTP/WebDAV server online at 8890
01:17:38 Server online at 1111 (pid 1)
01:28:20 Incorrect login for dba from IP [127.0.0.1]
01:31:25 Incorrect login for dba from IP [127.0.0.1]
01:31:58 Incorrect login for dba from IP [127.0.0.1]
01:32:15 PL LOG: Loader started
01:33:58 PL LOG: No more files to load. Loader has finished,
01:34:06 Checkpoint started
01:34:08 Checkpoint finished, log reused
01:38:45 Server received signal 2
01:38:45 Initiating normal shutdown
01:38:45 Checkpoint started
01:38:45 Checkpoint finished, log reused
01:38:45 Server shutdown complete

        Thu Mar 17 2022
01:38:51 { Loading plugin 1: Type `plain', file `geos' in `../hosting'
01:38:51   plain version 1.2.3233 from OpenLink Software
01:38:51   GEOS plugin based on Geometry Engine Open Source library from Open Source Geospatial Foundation
01:38:51   SUCCESS plugin 1: loaded from ../hosting/geos.so }
01:38:51 { Loading plugin 2: Type `plain', file `proj4' in `../hosting'
01:38:51   plain version 1.1.3233 from OpenLink Software
01:38:51   Cartographic Projections support based on Frank Warmerdam's proj4 library
01:38:51   SUCCESS plugin 2: loaded from ../hosting/proj4.so }
01:38:51 { Loading plugin 3: Type `plain', file `shapefileio' in `../hosting'
01:38:51   ShapefileIO version 0.1virt71 from OpenLink Software
01:38:51   Shapefile support based on Frank Warmerdam's Shapelib
01:38:51   SUCCESS plugin 3: loaded from ../hosting/shapefileio.so }
01:38:51 OpenLink Virtuoso Universal Server
01:38:51 Version 07.20.3233-pthreads for Linux as of Jun 22 2021
01:38:51 uses OpenSSL 1.0.2u  20 Dec 2019
01:38:51 uses parts of PCRE, Html Tidy
01:38:51 Database version 3126
01:38:51 SQL Optimizer enabled (max 1000 layouts)
01:38:52 Compiler unit is timed at 0.000219 msec
01:38:54 Roll forward started
01:38:54 Roll forward complete
01:38:56 Checkpoint started
01:38:57 Checkpoint finished, log reused
01:38:59 HTTP/WebDAV server online at 8890
01:38:59 Server online at 1111 (pid 1)
02:39:00 Checkpoint started
02:39:01 Checkpoint finished, log reused
03:39:01 Checkpoint started
03:39:01 Checkpoint finished, log reused
04:39:02 Checkpoint started
04:39:02 Checkpoint finished, log reused
05:39:03 Checkpoint started
05:39:03 Checkpoint finished, log reused
06:39:04 Checkpoint started
06:39:04 Checkpoint finished, log reused
07:39:05 Checkpoint started
07:39:06 Checkpoint finished, log reused
08:39:06 Checkpoint started
08:39:07 Checkpoint finished, log reused
09:39:08 Checkpoint started
09:39:08 Checkpoint finished, log reused
10:39:09 Checkpoint started
10:39:09 Checkpoint finished, log reused
11:39:10 Checkpoint started
11:39:10 Checkpoint finished, log reused
12:39:11 Checkpoint started
12:39:11 Checkpoint finished, log reused
13:39:12 Checkpoint started
13:39:12 Checkpoint finished, log reused
14:39:13 Checkpoint started
14:39:13 Checkpoint finished, log reused
15:39:14 Checkpoint started
15:39:14 Checkpoint finished, log reused
16:39:15 Checkpoint started
16:39:15 Checkpoint finished, log reused
17:39:16 Checkpoint started
17:39:17 Checkpoint finished, log reused
18:39:17 Checkpoint started
18:39:18 Checkpoint finished, log reused
19:39:18 Checkpoint started
19:39:19 Checkpoint finished, log reused
20:39:20 Checkpoint started
20:39:20 Checkpoint finished, log reused
21:39:21 Checkpoint started
21:39:21 Checkpoint finished, log reused
22:39:22 Checkpoint started
22:39:22 Checkpoint finished, log reused
22:46:08 * Monitor: Locks are held for a long time
23:06:35 Incorrect login for dba from IP [127.0.0.1]
23:39:28 Checkpoint started
23:39:31 Checkpoint finished, log reused

        Fri Mar 18 2022
00:39:33 Checkpoint started
00:39:33 Checkpoint finished, log reused
01:39:34 Checkpoint started
01:39:34 Checkpoint finished, log reused
02:39:35 Checkpoint started
02:39:35 Checkpoint finished, log reused
02:59:44 Server received signal 2
02:59:44 Initiating normal shutdown
02:59:46 Checkpoint started
02:59:49 Checkpoint finished, log reused
02:59:49 Server shutdown complete

        Fri Mar 18 2022
02:59:50 { Loading plugin 1: Type `plain', file `geos' in `../hosting'
02:59:50   plain version 1.2.3233 from OpenLink Software
02:59:50   GEOS plugin based on Geometry Engine Open Source library from Open Source Geospatial Foundation
02:59:50   SUCCESS plugin 1: loaded from ../hosting/geos.so }
02:59:50 { Loading plugin 2: Type `plain', file `proj4' in `../hosting'
02:59:50   plain version 1.1.3233 from OpenLink Software
02:59:50   Cartographic Projections support based on Frank Warmerdam's proj4 library
02:59:50   SUCCESS plugin 2: loaded from ../hosting/proj4.so }
02:59:50 { Loading plugin 3: Type `plain', file `shapefileio' in `../hosting'
02:59:50   ShapefileIO version 0.1virt71 from OpenLink Software
02:59:50   Shapefile support based on Frank Warmerdam's Shapelib
02:59:50   SUCCESS plugin 3: loaded from ../hosting/shapefileio.so }
02:59:50 OpenLink Virtuoso Universal Server
02:59:50 Version 07.20.3233-pthreads for Linux as of Jun 22 2021
02:59:50 uses OpenSSL 1.0.2u  20 Dec 2019
02:59:50 uses parts of PCRE, Html Tidy
02:59:50 Database version 3126
02:59:50 SQL Optimizer enabled (max 1000 layouts)
02:59:51 Compiler unit is timed at 0.000168 msec
02:59:52 Roll forward started
02:59:52 Roll forward complete
02:59:54 Checkpoint started
02:59:54 Checkpoint finished, log reused
02:59:56 HTTP/WebDAV server online at 8890
02:59:56 Server online at 1111 (pid 1)
03:02:22 Incorrect login for dba from IP [127.0.0.1]
03:03:49 PL LOG: Loader started
03:05:55 PL LOG: No more files to load. Loader has finished,
03:06:29 Checkpoint started
03:06:33 Checkpoint finished, log reused
03:41:33 Incorrect login for dba from IP [127.0.0.1]
04:04:45 * Monitor: Locks are held for a long time
04:12:06 * Monitor: Should read for update because lock escalation from shared to exclusive fails frequently (1)
04:12:06 * Monitor: Should read for update because lock escalation from shared to exclusive fails frequently (2)
04:12:16 * Monitor: Locks are held for a long time
04:30:01 PL LOG: Loader started
04:30:01 PL LOG: No more files to load. Loader has finished,
04:30:13 Checkpoint started
04:30:16 Checkpoint finished, log reused
04:32:43 Incorrect login for dba from IP [127.0.0.1]
04:33:05 PL LOG: Loader started
04:33:05 PL LOG: No more files to load. Loader has finished,
04:33:33 Server received signal 2
04:33:33 Initiating normal shutdown
04:33:33 Checkpoint started
04:33:34 Checkpoint finished, log reused
04:33:34 Server shutdown complete

        Fri Mar 18 2022
04:33:35 { Loading plugin 1: Type `plain', file `geos' in `../hosting'
04:33:35   plain version 1.2.3233 from OpenLink Software
04:33:35   GEOS plugin based on Geometry Engine Open Source library from Open Source Geospatial Foundation
04:33:35   SUCCESS plugin 1: loaded from ../hosting/geos.so }
04:33:35 { Loading plugin 2: Type `plain', file `proj4' in `../hosting'
04:33:35   plain version 1.1.3233 from OpenLink Software
04:33:35   Cartographic Projections support based on Frank Warmerdam's proj4 library
04:33:35   SUCCESS plugin 2: loaded from ../hosting/proj4.so }
04:33:35 { Loading plugin 3: Type `plain', file `shapefileio' in `../hosting'
04:33:35   ShapefileIO version 0.1virt71 from OpenLink Software
04:33:35   Shapefile support based on Frank Warmerdam's Shapelib
04:33:35   SUCCESS plugin 3: loaded from ../hosting/shapefileio.so }
04:33:35 OpenLink Virtuoso Universal Server
04:33:35 Version 07.20.3233-pthreads for Linux as of Jun 22 2021
04:33:35 uses OpenSSL 1.0.2u  20 Dec 2019
04:33:35 uses parts of PCRE, Html Tidy
04:33:35 Database version 3126
04:33:35 SQL Optimizer enabled (max 1000 layouts)
04:33:36 Compiler unit is timed at 0.000167 msec
04:33:38 Roll forward started
04:33:38 Roll forward complete
04:33:40 Checkpoint started
04:33:42 Checkpoint finished, log reused
04:33:44 HTTP/WebDAV server online at 8890
04:33:44 Server online at 1111 (pid 1)
04:33:56 Incorrect login for dba from IP [127.0.0.1]
04:34:19 PL LOG: Loader started
04:34:19 PL LOG: No more files to load. Loader has finished,
04:44:48 PL LOG: Loader started
04:44:48 PL LOG: No more files to load. Loader has finished,
04:44:57 PL LOG: Loader started
04:44:57 PL LOG: No more files to load. Loader has finished,
04:45:14 Checkpoint started
04:45:14 Checkpoint finished, log reused
04:45:30 PL LOG: Loader started
04:45:30 PL LOG: No more files to load. Loader has finished,
04:45:35 PL LOG: Loader started
04:45:35 PL LOG: No more files to load. Loader has finished,
04:45:37 PL LOG: Loader started
04:45:37 PL LOG: No more files to load. Loader has finished,
04:45:56 PL LOG: Loader started
04:45:56 PL LOG: No more files to load. Loader has finished,
04:45:59 Checkpoint started
04:45:59 Checkpoint finished, log reused
04:46:38 PL LOG: Loader started
04:46:38 PL LOG: No more files to load. Loader has finished,
04:46:41 Checkpoint started
04:46:41 Checkpoint finished, log reused
04:46:51 PL LOG: Loader started
04:46:51 PL LOG: No more files to load. Loader has finished,
04:46:53 Checkpoint started
04:46:54 Checkpoint finished, log reused
04:48:05 PL LOG: Loader started
04:50:04 PL LOG: No more files to load. Loader has finished,
04:51:57 Checkpoint started
04:52:04 Checkpoint finished, log reused

        Fri Mar 18 2022
15:41:16 { Loading plugin 1: Type `plain', file `geos' in `../hosting'
15:41:16   plain version 1.2.3233 from OpenLink Software
15:41:16   GEOS plugin based on Geometry Engine Open Source library from Open Source Geospatial Foundation
15:41:16   SUCCESS plugin 1: loaded from ../hosting/geos.so }
15:41:16 { Loading plugin 2: Type `plain', file `proj4' in `../hosting'
15:41:16   plain version 1.1.3233 from OpenLink Software
15:41:16   Cartographic Projections support based on Frank Warmerdam's proj4 library
15:41:16   SUCCESS plugin 2: loaded from ../hosting/proj4.so }
15:41:16 { Loading plugin 3: Type `plain', file `shapefileio' in `../hosting'
15:41:16   ShapefileIO version 0.1virt71 from OpenLink Software
15:41:16   Shapefile support based on Frank Warmerdam's Shapelib
15:41:16   SUCCESS plugin 3: loaded from ../hosting/shapefileio.so }
15:41:16 OpenLink Virtuoso Universal Server
15:41:16 Version 07.20.3233-pthreads for Linux as of Jun 22 2021
15:41:16 uses OpenSSL 1.0.2u  20 Dec 2019
15:41:16 uses parts of PCRE, Html Tidy
15:41:16 Database version 3126
15:41:16 Unlinked the temp db file virtuoso-temp.db as its size (1154MB) was greater than TempDBSize INI (10MB)
15:41:16 SQL Optimizer enabled (max 1000 layouts)
15:41:17 Compiler unit is timed at 0.000165 msec
15:41:19 Roll forward started
15:41:19     44 transactions, 21139 bytes replayed (100 %)
15:41:19 Roll forward complete
15:41:23 Checkpoint started
15:41:25 Checkpoint finished, log reused
15:41:27 HTTP/WebDAV server online at 8890
15:41:27 Server online at 1111 (pid 1)
15:41:27 Incorrect login for dba from IP [127.0.0.1]
kidehen commented 2 years ago
select count(*) as ?s where {
    ?obs a sosa:Observation ;
      sosa:hasFeatureOfInterest ?foi ;
      sosa:resultTime ?ddate ;
      sosa:observedProperty ?prop ;
      sosa:hasResult [
        qudt-1-1:numericValue ?value ;
        qudt-1-1:unit ?unit ] .
} group by ?beachObs
order by asc(?s)

What does the following produce?

select ?g count(*) as ?s where { graph ?g {
    ?obs a sosa:Observation ;
      sosa:hasFeatureOfInterest ?foi ;
      sosa:resultTime ?ddate ;
      sosa:observedProperty ?prop ;
      sosa:hasResult [
        qudt-1-1:numericValue ?value ;
        qudt-1-1:unit ?unit ] . }
} group by ?g ?beachObs
order by asc(?s)

Goal is to determine if the Virtuoso's quad-centric data organization isn't skewing your query solution expectations.

davidshumway commented 2 years ago
select ?g count(*) as ?s where { graph ?g {
    ?obs a sosa:Observation ;
      sosa:hasFeatureOfInterest ?foi ;
      sosa:resultTime ?ddate ;
      sosa:observedProperty ?prop ;
      sosa:hasResult [
        qudt-1-1:numericValue ?value ;
        qudt-1-1:unit ?unit ] . }
} group by ?g ?obs
order by asc(?s)
g s
http://www.example.org 1
http://www.example.org 1
http://www.example.org 1
http://www.example.org 1
http://www.example.org 1
... ...
http://www.example.org 1
http://www.example.org 1
http://www.example.org 1
http://www.example.org 1
http://www.example.org 1

(1048576 rows × 2 columns)

kidehen commented 2 years ago
select ?g count(*) as ?s where { graph ?g {
    ?obs a sosa:Observation ;
      sosa:hasFeatureOfInterest ?foi ;
      sosa:resultTime ?ddate ;
      sosa:observedProperty ?prop ;
      sosa:hasResult [
        qudt-1-1:numericValue ?value ;
        qudt-1-1:unit ?unit ] . }
} group by ?g ?obs
order by asc(?s)

g s http://www.example.org 1 http://www.example.org 1 http://www.example.org 1 http://www.example.org 1 http://www.example.org 1 ... ... http://www.example.org 1 http://www.example.org 1 http://www.example.org 1 http://www.example.org 1 http://www.example.org 1 (1048576 rows × 2 columns)

Okay that eliminates concerns about the quad storage aspect of this issue.

In retrospect, the following would have been fine too i.e., if different from your original then it would indicate effects of data organized as quads.

select  count(*) as ?s where { graph ?g {
    ?obs a sosa:Observation ;
      sosa:hasFeatureOfInterest ?foi ;
      sosa:resultTime ?ddate ;
      sosa:observedProperty ?prop ;
      sosa:hasResult [
        qudt-1-1:numericValue ?value ;
        qudt-1-1:unit ?unit ] . }
} group by ?obs
order by asc(?s)
davidshumway commented 2 years ago
select  count(*) as ?s where { graph ?g {
    ?obs a sosa:Observation ;
      sosa:hasFeatureOfInterest ?foi ;
      sosa:resultTime ?ddate ;
      sosa:observedProperty ?prop ;
      sosa:hasResult [
        qudt-1-1:numericValue ?value ;
        qudt-1-1:unit ?unit ] . }
} group by ?obs
order by asc(?s)
s  

1 1 1 1 1 ... 1 1 1 1 1

1048576 rows × 1 columns

davidshumway commented 2 years ago

So it it just an issue with the SPARQL query then?

davidshumway commented 2 years ago

It seems the issue was that the same triples were being loaded into the graph multiple times. Starting over from scratch with a new database and importing the triples appears to resolve the issue.