openlink / virtuoso-opensource

Virtuoso is a high-performance and scalable Multi-Model RDBMS, Data Integration Middleware, Linked Data Deployment, and HTTP Application Server Platform
https://vos.openlinksw.com
Other
857 stars 210 forks source link

Error loading geographical data with version 7.10.3211 #295

Open ma-garcia opened 9 years ago

ma-garcia commented 9 years ago

Hi. I'm running two Virtuoso instances, version 7.00.3203 and version 7.10.3211 on a linux machine.

I use virtuoso jdbc 3 to load geographical data like this:

<http://www.zaragoza.es/api/recurso/geometry/WGS84/41.635338_-0.911101>
      a       <http://www.opengis.net/ont/sf#Point> , <http://www.w3.org/2003/01/geo/wgs84_pos#Point> ;
      <http://www.opengis.net/ont/geosparql#asWKT>
              "<http://www.opengis.net/def/crs/OGC/1.3/CRS84>Point(-0.911101 41.635338)"^^<http://www.opengis.net/ont/geosparql#wktLiteral> ;
      <http://www.w3.org/2003/01/geo/wgs84_pos#lat>
              "41.635338"^^<http://www.w3.org/2001/XMLSchema#double> ;
      <http://www.w3.org/2003/01/geo/wgs84_pos#long>
              "-0.911101"^^<http://www.w3.org/2001/XMLSchema#double> .

If I remove the property asWKT, leaving just the lat and long properties and try to load it in version 7.10.3211, data is loaded properly.

I have checked data syntax and it is correct according to geosparql so I don't know what the problem is.

This issue is related with issue 274.

HughWilliams commented 9 years ago

I have been able to recreate, even loading from isql using the ttlp() function thus it is not JDBC related:

SQL> ttlp ('http://www.zaragoza.es/api/recurso/geometry/WGS84/41.635338_-0.911101       a       http://www.opengis.net/ont/sf#Point , http://www.w3.org/2003/01/geo/wgs84_pos#Point ;       http://www.opengis.net/ont/geosparql#asWKT               "http://www.opengis.net/def/crs/OGC/1.3/CRS84Point(-0.911101 41.635338)"^^http://www.opengis.net/ont/geosparql#wktLiteral ;       http://www.w3.org/2003/01/geo/wgs84_pos#lat               "41.635338"^^http://www.w3.org/2001/XMLSchema#double ;       http://www.w3.org/2003/01/geo/wgs84_pos#long               "-0.911101"^^http://www.w3.org/2001/XMLSchema#double .','','http://geo',0);

**\* Error 42000: VD [Virtuoso Server]RDFGE: RDF box with a geometry RDF type and a non-geometry content
in
rdf_box:(BIF),
DB.DBA.TTLP_RL_TRIPLE_L([executable]/ttlpv.sql:255),
rdf_load_turtle:(BIF),
DB.DBA.TTLP_V([executable]/ttlpv.sql:554),
DB.DBA.TTLP([executable]/sparql.sql:2888),
<Top Level>
at line 17 of Top-Level:
ttlp ('http://www.zaragoza.es/api/recurso/geometry/WGS84/41.635338_-0.911101       a       http://www.opengis.net/ont/sf#Point , http://www.w3.org/2003/01/geo/wgs84_pos#Point ;       http://www.opengis.net/ont/geosparql#asWKT               "http://www.opengis.net/def/crs/OGC/1.3/CRS84Point(-0.911101 41.635338)"^^http://www.opengis.net/ont/geosparql#wktLiteral ;       http://www.w3.org/2003/01/geo/wgs84_pos#lat               "41.635338"^^http://www.w3.org/2001/XMLSchema#double ;       http://www.w3.org/2003/01/geo/wgs84_pos#long               "-0.911101"^^http://www.w3.org/2001/XMLSchema#double .','','http://geo',0)
SQL>

This issue has been reported to development to look into ...

yonyonson commented 9 years ago

Any progress on this one? I have following SPARQL failing (with another CRS than WGS84):

INSERT INTO GRAPH http://test.delete.me/ {
    http://test.delete.me/geo http://www.opengis.net/ont/geosparql#geometry "<http://www.opengis.net/def/crs/EPSG/0/25833>POLYGON ((361895.00009999983 7315465.0001000017, 365966.00009999983 7317083.0001000017, 366027.00009999983 7317152.0001000017, 365784.00009999983 7318672.0001000017, 365741.00009999983 7318698.0001000017, 365662.00009999983 7318707.0001000017, 362737.00009999983 7319795.0001000017, 362607.00009999983 7319787.0001000017, 357663.00009999983 7320780.0001000017))"^^http://www.opengis.net/ont/geosparql#wktLiteral
}

Error is:

Virtuoso 42000 Error RDFGE: RDF box with a geometry RDF type and a non-geometry content

SPARQL query:
define sql:big-data-const 0 
#output-format:text/html
define sql:signal-void-variables 1 INSERT INTO GRAPH http://test.delete.me/ {
    http://test.delete.me/geo http://www.opengis.net/ont/geosparql#geometry "<http://www.opengis.net/def/crs/EPSG/0/25833>POLYGON ((361895.00009999983 7315465.0001000017, 365966.00009999983 7317083.0001000017, 366027.00009999983 7317152.0001000017, 365784.00009999983 7318672.0001000017, 365741.00009999983 7318698.0001000017, 365662.00009999983 7318707.0001000017, 362737.00009999983 7319795.0001000017, 362607.00009999983 7319787.0001000017, 357663.00009999983 7320780.0001000017))"^^http://www.opengis.net/ont/geosparql#wktLiteral
}

I am running 07.20.3212 on Ubuntu 14.04 LTS.

HughWilliams commented 9 years ago

@yonyonson: This issue is still to be resolved, I have reported your occurrence to the bug report so it can be checked also ...

zazi commented 8 years ago

any news on this issue? currently, we are trying to load the RDF dumps from DNB into Virtuoso (stable/7 (docker container) and develop/7 (self-compiled)) without any success so far. we are always getting the error from above (Virtuoso 42000 Error RDFGE: rdf box with a geometry rdf type and a non geometry content). According to some DNB representatives, an example that includes geo data looks like this (note the reply is in German). Thanks a lot in advance for any help.

ma-garcia commented 8 years ago

Hi everyone,

I've uploaded my Virtuoso instance to 07.20.3217 and I still have the same problem. But I've found a way to upload the data. Instead of

<http://www.zaragoza.es/api/recurso/geometry/WGS84/41.635338_-0.911101>
      a       <http://www.opengis.net/ont/sf#Point> , <http://www.w3.org/2003/01/geo/wgs84_pos#Point> ;
      <http://www.opengis.net/ont/geosparql#asWKT>
              "<http://www.opengis.net/def/crs/OGC/1.3/CRS84>Point(-0.911101 41.635338)"^^<http://www.opengis.net/ont/geosparql#wktLiteral> ;
      <http://www.w3.org/2003/01/geo/wgs84_pos#lat>
              "41.635338"^^<http://www.w3.org/2001/XMLSchema#double> ;
      <http://www.w3.org/2003/01/geo/wgs84_pos#long>
              "-0.911101"^^<http://www.w3.org/2001/XMLSchema#double> .

I have used this other structure:

<http://www.zaragoza.es/api/recurso/geometry/WGS84/41.635338_-0.911101>
      a       <http://www.opengis.net/ont/sf#Point> , <http://www.w3.org/2003/01/geo/wgs84_pos#Point> ;
      <http://www.opengis.net/ont/geosparql#crs>
              <http://www.opengis.net/def/crs/OGC/1.3/CRS84>;
      <http://www.opengis.net/ont/geosparql#asWKT>
              "Point(-0.911101 41.635338)"^^<http://www.opengis.net/ont/geosparql#wktLiteral> ;
      <http://www.w3.org/2003/01/geo/wgs84_pos#lat>
              "41.635338"^^<http://www.w3.org/2001/XMLSchema#double> ;
      <http://www.w3.org/2003/01/geo/wgs84_pos#long>
              "-0.911101"^^<http://www.w3.org/2001/XMLSchema#double> .

My question is if the previous structure is gonna be supported in next versions.

jakubklimek commented 7 years ago

@HughWilliams I also have the same issue, with the current develop/7 c30a3c711993b2f60d8e807c4bab8d717a56f28b with the CRS specification in geo:wktLiteral, which is how GeoSPARQL specifies it. Can the Virtuoso processing be turned off somehow? Or can this be fixed?

HughWilliams commented 7 years ago

@jakubklimek: GeoSPARQL support is being scheduled for the next Virtuoso 8 release ...

p1d1d1 commented 7 years ago

My experience so far with Virtuoso OS 7.2.4 is that:

nandana commented 6 years ago

Hi,

I am using OpenLink Virtuoso Server VOS 07.20.3229 and run into the same problem while loading Wikidata. Is there any workaround for this issue until GeoSPARQL support implemented?

So based on what I read from @ma-garcia, if I convert

_:x <http://www.opengis.net/ont/geosparql#asWKT>
              "<http://www.opengis.net/def/crs/OGC/1.3/CRS84>Point(-0.911101 41.635338)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>  .

to

_:x <http://www.opengis.net/ont/geosparql#crs>
              <http://www.opengis.net/def/crs/OGC/1.3/CRS84>;
      <http://www.opengis.net/ont/geosparql#asWKT>
              "Point(-0.911101 41.635338)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>  .

I should be able to load the data without an issue to Virtuoso, right?

The other option, according to GH-773, it says that if a new datatype is used (and declare it as a rdfs:subPropertyOf <http://www.opengis.net/ont/geosparql#wktLiteral>), it can prevent Virtuoso from complaining. Are there any other better solution for this? It is kind of a showstopper for us!

Best Regards, Nandana

HughWilliams commented 6 years ago

Base GeoSPARQL support was added to the Virtuoso open source develop/7 branch last month, as indicated at --

-- and will soon be added to the stable/7 branch.

Thus I would suggest building Virtuoso and the required Geospatial plugin from the develop/7 branch as detailed in the readme file and test to see if it resolves the problem ...

p1d1d1 commented 6 years ago

_:x http://www.opengis.net/ont/geosparql#crs

@nandana where is this property coming from?

By the way, I'm on Virtuoso 07.20.3217 and can load geodata (e.g., https://github.com/p1d1d1/p1d1d1.github.io/blob/master/triples/cantons84.nt). Data type is changed from wktLiteral to virtrdf#Geometry.

nandana commented 6 years ago

Hi @p1d1d1 @HughWilliams

I just took that from the previous example. Now I looked at the actual data I was loading from Wikidata, most of them are like the following and they are parsed without any error.

@prefix wds: <http://www.wikidata.org/entity/statement/> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .

wds:Q31138473-FE72609E-8273-4F0C-819E-713F2C4B4C46 
        ps:P625 "Point(8.61946 59.43613)"^^geo:wktLiteral .

But there are few other statements (related to the coordinates of Mars, I guess) like the following:

@prefix wds: <http://www.wikidata.org/entity/statement/> .
@prefix wikibase: <http://wikiba.se/ontology#> .
@prefix ps: <http://www.wikidata.org/prop/statement/> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .
@prefix psv: <http://www.wikidata.org/prop/statement/value/> .
@prefix wdv: <http://www.wikidata.org/value/> .

wds:Q56632442-fbfdcfd3-4b72-0026-1889-4cd40279f4fd wikibase:rank wikibase:NormalRank ;
    ps:P625 "<http://www.wikidata.org/entity/Q111> Point(1.13 3.83)"^^geo:wktLiteral ;
    psv:P625 wdv:a5d4d0a028e00370b6b216ad3b9b197e .

They are not parsed correctly and they lead to the following error.

SQL> select * from DB.DBA.load_list;
Connected to OpenLink Virtuoso
Driver: 07.20.3229 OpenLink Virtuoso ODBC Driver
ll_file                                                                           ll_graph                                                                          ll_state    ll_started           ll_done              ll_host     ll_work_time  ll_error
VARCHAR NOT NULL                                                                  VARCHAR                                                                           INTEGER     TIMESTAMP            TIMESTAMP            INTEGER     INTEGER     VARCHAR
_______________________________________________________________________________
/staging/get_test2.ttl                                         https://www.example.org/test2                                                      2           2018.10.1 17:6.7 366785000  2018.10.1 17:6.7 369714000  0           NULL        42000 RDFGE: RDF box with a geometry RDF type and a non-geometry content

I have not read the GeoSPARQL spec in detail but having a quick look at 8.5 Requirements for WKT Serialization (serialization=WKT) (page 34) and other pages, the above representation seems valid, isn't it? If not, I will have to go back to Wikidata devs :)

In either case, I can also try with the develop/7 branch.

TallTed commented 6 years ago

@nandana - The described error is expected with VOS prior to the 7.2.6 update, or without the new plugin. Please let us know how things go with the latest develop/7!

nandana commented 6 years ago

Thanks @TallTed !

I've installed the latest version from develop/7 and following the guide to set GeoSPARQL.

In the startup, it seems that those plugins are loaded correctly.

Tue Oct 02 2018
19:41:44 { Loading plugin 1: Type `plain', file `wikiv' in `/wikidata/virtuoso/lib/virtuoso/hosting'
19:41:44   WikiV version 0.6 from OpenLink Software
19:41:44   Support functions for WikiV collaboration tool
19:41:44   SUCCESS plugin 1: loaded from /wikidata/virtuoso/lib/virtuoso/hosting/wikiv.so }
19:41:44 { Loading plugin 2: Type `plain', file `mediawiki' in `/wikidata/virtuoso/lib/virtuoso/hosting'
19:41:44   MediaWiki version 0.1 from OpenLink Software
19:41:44   Support functions for MediaWiki collaboration tool
19:41:44   SUCCESS plugin 2: loaded from /wikidata/virtuoso/lib/virtuoso/hosting/mediawiki.so }
19:41:44 { Loading plugin 3: Type `plain', file `creolewiki' in `/wikidata/virtuoso/lib/virtuoso/hosting'
19:41:44   CreoleWiki version 0.1 from OpenLink Software
19:41:44   Support functions for CreoleWiki collaboration tool
19:41:44   SUCCESS plugin 3: loaded from /wikidata/virtuoso/lib/virtuoso/hosting/creolewiki.so }
19:41:44 { Loading plugin 8: Type `plain', file `proj4' in `/wikidata/virtuoso/lib/virtuoso/hosting'
19:41:44   plain version 3230 from OpenLink Software
19:41:44   Cartographic Projections support based on Frank Warmerdam's proj4 library
19:41:44   SUCCESS plugin 8: loaded from /wikidata/virtuoso/lib/virtuoso/hosting/proj4.so }
19:41:44 { Loading plugin 9: Type `plain', file `geos' in `/wikidata/virtuoso/lib/virtuoso/hosting'
19:41:44   plain version 3230 from OpenLink Software
19:41:44   GEOS plugin based on Geometry Engine Open Source library from Open Source Geospatial Foundation
19:41:44   SUCCESS plugin 9: loaded from /wikidata/virtuoso/lib/virtuoso/hosting/geos.so }
19:41:44 { Loading plugin 10: Type `plain', file `shapefileio' in `/wikidata/virtuoso/lib/virtuoso/hosting'
19:41:44   ShapefileIO version 0.1virt71 from OpenLink Software
19:41:44   Shapefile support based on Frank Warmerdam's Shapelib
19:41:44   SUCCESS plugin 10: loaded from /wikidata/virtuoso/lib/virtuoso/hosting/shapefileio.so }
19:41:44 OpenLink Virtuoso Universal Server
19:41:44 Version 07.20.3230-pthreads for Linux as of Oct  2 2018

but I still get the same error.

SQL> select * from DB.DBA.load_list;
Connected to OpenLink Virtuoso
Driver: 07.20.3230 OpenLink Virtuoso ODBC Driver
ll_file                                                                           ll_graph                                                                          ll_state    ll_started           ll_done              ll_host     ll_work_time  ll_error
VARCHAR NOT NULL                                                                  VARCHAR                                                                           INTEGER     TIMESTAMP            TIMESTAMP            INTEGER     INTEGER     VARCHAR
_______________________________________________________________________________

/wikidata/data/geo-test.ttl                                                https://www.example.org/test                                                      2           2018.10.2 19:42.15 16983000  2018.10.2 19:42.15 18424000  0           NULL        42000 RDFGE: RDF box with a geometry RDF type and a non-geometry content

1 Rows. -- 0 msec.

Here's the content I am loading.

@prefix wds: <http://www.wikidata.org/entity/statement/> .
@prefix wikibase: <http://wikiba.se/ontology#> .
@prefix ps: <http://www.wikidata.org/prop/statement/> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .
@prefix psv: <http://www.wikidata.org/prop/statement/value/> .
@prefix wdv: <http://www.wikidata.org/value/> .

wds:Q56632442-fbfdcfd3-4b72-0026-1889-4cd40279f4fd a wikibase:BestRank ;
        wikibase:rank wikibase:NormalRank ;
        ps:P625 "<http://www.wikidata.org/entity/Q111> Point(1.13 3.83)"^^geo:wktLiteral ;
        psv:P625 wdv:a5d4d0a028e00370b6b216ad3b9b197e .

Do you see any reason for the error?

TallTed commented 6 years ago

@nandana - It is often helpful to columnize your Turtle, as this can reveal oddness in the data that isn't so easy to see when the text is more tightly spaced.

wds:Q56632442-fbfdcfd3-4b72-0026-1889-4cd40279f4fd 
    a               wikibase:BestRank ;
    wikibase:rank   wikibase:NormalRank ;
    ps:P625         "<http://www.wikidata.org/entity/Q111> Point(1.13 3.83)"^^geo:wktLiteral ;
    psv:P625        wdv:a5d4d0a028e00370b6b216ad3b9b197e .

This appears to me to not be a valid geo:wktLiteral --

"<http://www.wikidata.org/entity/Q111> Point(1.13 3.83)"^^geo:wktLiteral

I think that's meant to be --

"Point(1.13 3.83)"^^geo:wktLiteral

I cannot tell where <http://www.wikidata.org/entity/Q111> belongs here, but it's certainly not within the geo:wktLiteral.

nandana commented 6 years ago

It looks odd to me too, but isn't such URIs are allowed in the WKT representation of geometry? For example, here it is says

A notable feature is that the CRS URI is concatenated with the WKT string in the literal.

or the following example from the GeoSPARQL spec Sec 8.5

A second example below encodes the same point using <http://www.opengis.net/def/crs/EPSG/0/4326>: a WGS 84 geodetic latitude-longitude spatial reference system (note that this spatial reference system defines a different axis order):"<http://www.opengis.net/def/crs/EPSG/0/4326>Point(33.95 -83.38)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>

This was taken directly from the Wikidata dump. If someone with GeoSPARQL expertise can confirm indeed this is not a valid value for geo:wktLiteral datatype, I can raise the issue in Wikidata.

TallTed commented 6 years ago

@nandana -

I see. You're correct, the GeoSPARQL spec does permit any valid URI to be used in this position (which seems silly, as it could lead to all sorts of nonsense data). I'm not sure why this data is not being accepted.

@pkleef, @IvanMikhailov -- Any comment?


That said, I note that <http://www.wikidata.org/entity/Q111> isn't a URI for a CRS (Coordinate Reference System); it's a URI for the planet Mars, as you'll see if you dereference it yourself — and there are multiple CRS in use for Martian features/locations.

Wikidata don't seem to have figured out how to address this, and their current kludge means only one CRS per celestial body (possibly including Earth; I didn't read deeply enough to be sure about this) in Wikidata, although multiple CRS are already used for locations on Mars, the Moon, and others (though perhaps not on Wikidata)...

This is clearly an evolving space (pardon the pun).

nandana commented 6 years ago

Thanks @TallTed !

One quick question related to Virtuoso + GeoSPARQL plugins. When VOS complains about a certain error, is there a way to get an idea about which line it is about (similar to the info given for syntax errors)?

E.g.

06:10:55 PL LOG: File /wikidata/data/1_99/wikidump-000000004.ttl error 42000 RDFGE: RDF box with a geometry RDF type and a non-geometry content

These are files with ~24 million lines each and it is impossible to detect where the errors are using manual inspections. I could check for the previous type of errors using grep but this is something else.

TallTed commented 6 years ago

@nandana - I see your point about the error message, and have raised it internally to development. You may want to create an issue specific to that, so we can be sure to notify you when that enhancement is implemented.

nandana commented 6 years ago

Thanks @TallTed ! I just created an issue for this.

p1d1d1 commented 6 years ago

That wikidata Triple is IMHO not correct GeoSPARQL, since the URI is not an URI for a CRS. The error in Virtuoso is related, by the way, to the presence of this URI. Virtuoso doesn't support so far wkt serializations with the CRS-URI (PS: this URI is nor mandatory)

p1d1d1 commented 6 years ago

... and I'd personally avoid having this URI in the geometry serialization. Most geo-libraries won't read this as valid wkt

pkleef commented 5 years ago

When you try to load the following data using the latest version of virtuoso from develop/7:

@prefix wds: <http://www.wikidata.org/entity/statement/> .
@prefix wikibase: <http://wikiba.se/ontology#> .
@prefix ps: <http://www.wikidata.org/prop/statement/> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .
@prefix psv: <http://www.wikidata.org/prop/statement/value/> .
@prefix wdv: <http://www.wikidata.org/value/> .

wds:Q56632442-fbfdcfd3-4b72-0026-1889-4cd40279f4fd wikibase:rank wikibase:NormalRank ;
    ps:P625 "<http://www.wikidata.org/entity/Q111> Point(1.13 3.83)"^^geo:wktLiteral ;
    psv:P625 wdv:a5d4d0a028e00370b6b216ad3b9b197e .

you receive the following error:

Error 22023: VD [Virtuoso Server]TURTLE RDF loader, line 1: GEO11: IRI is not known, consider
registering it via DB.DBA.SYS_V7PROJ4_SR_IRIS (near row 1 col 39 of
'<http://www.wikidata.org/entity/Q111> Point(1.13 3.83)')

The reason as correctly explained by @p1d1d1 is that this IRI does not represent a correct Coordinate Reference System (CRS) URI.

Without a well defined coordinate system, these points are as useful as an amount without specifying the currency.

p1d1d1 commented 5 years ago

@pkleef very nice implementation

nandana commented 5 years ago

@pkleef thanks for the patch! I will try it out.

@p1d1d1 I certainly agree with you. I will try to raise the issue in the Wikidata side too!

TallTed commented 5 years ago

@pkleef - It appears to me that 9cb9e2bbd298bb7313163c1e1861ad7e50538bd9 closes #789, not #295.

TallTed commented 5 years ago

@pkleef, @p1d1d1 -

My reading of GeoSPARQL is that any URI may be used as a CRS URI within a wktLiteral -- i.e., there is no reference to a canonical list/registry of acceptable/valid CRS URIs.

This makes some sense, as all possible CRS do not yet exist, so future CRS can be supported without needing to update GeoSPARQL. I would think it better if there were a registry of CRS (similar to IANA MIME registry) and/or relationships between them (such that geodata expressed in "Mars 1979" can be related to "Mars 2000"), but this should not be necessary to load data in either/both/any CRS to an RDF store.

The specific CRS URI should be evaluated only when queries need to compare geodata -- at which point such operations should lead to errors indicating that, for instance, "CRS <http://example.com/mars2000> is not known", or "CRS <http://example.com/mars2000> has no known relation to CRS <http://example.com/mars1979>", or "CRS <http://example.com/mars2000> has no known relation to CRS <http://www.opengis.net/def/crs/OGC/1.3/CRS84>", or the like.

(Yes, using a URI that does not refer to a CRS can lead to nonsense interpretation, just like "15 quatloos" cannot be usefully compared to "USD 15", but "15 quatloos" can be usefully compared to "30 quatloos" whenever that new cryptocurrency is created/defined.)

p1d1d1 commented 5 years ago

@TallTed you're right, according to GeoSPARQL

Valid geo:wktLiterals are formed by concatenating a valid, absolute URI as defined in [RFC 2396], one or more spaces (Unicode U+0020 character) as a separator, and a WKT string as defined in Simple Features

But then it also says:

For geo:wktLiterals, the beginning URI identifies the spatial reference system for the geometry. The OGC maintains a set of CRS URIs under the http://www.opengis.net/def/crs/ namespace

So URI identifying Mars are not ok for me. This lead as you say to nonsense data. I personally don't 100% agree with the OGC approach putting URI in the wkt serialization, since such a kind of strings are non read as valid wkt by desktop GIS and I gues also by web-mapping libraries (to be tested). I'd have preferred an additional property hasCRS. But his is another story.

nandana commented 5 years ago

@p1d1d1 Yes, I agree this might not be the best design decision from GeoSPARQL on how to represent CRS information in RDF. But I am more inclined towards what @TallTed said. Though they are not optimal, I think they are legal according to the current spec. I don't read the second paragraph you cited as OGC maintains an exclusive set of CRS URIs and nothing else is accepted.

TallTed commented 5 years ago

@p1d1d1 - As @nandana says, the spec says that the "OGC maintains a set of CRS URIs", it does not say the "OGC maintains the set of CRS URIs" — i.e., there is nothing that says that only URIs from that OGC list are valid CRS URIs.

As things stand, the user is responsible for avoiding nonsense data, which may be painful for them, but this also means that they have the flexibility to adopt any appropriate CRS which may be developed in the future -- regardless of whether the OGC continues to maintain that list, or the OGC endorses/accepts the user's preferred new CRS, etc.

As to whether this is the optimal solution... I think it's a similar conundrum to langtagged string literals. The CRS is necessary to interpret the wktLiteral, just as a langtag is necessary to interpret a text literal string, so it must be an integral part of the coordinate literal string. Doing something new with the literal typing would be problematic, as it's already problematic to handle langtagging, so embedding the CRS within the wktLiteral makes a lot of sense to me.

Maybe there could be some syntactic sugar, such that "all wktLiteral in this serialized data file are based on CRS xyz", but that would lead to copy-and-paste issues when people mix small portions of data from multiple such files without the CRS declaring statements, and they'd be back in today's problematic state...

Desktop GIS and other tools which read this data as invalid are just old -- so they want an older version of the data, which assumed that only one CRS did or ever would exist. Such tools will be updated to understand the new data which recognizes the reality of multiple CRS in active use, and old datasets which are based on any CRS other than the now-declared-default will be updated to include the CRS they're based on, and all will be well again. (Really, all will be well for the first time, as there have long been multiple CRS in use, and data sets which did not declare the CRS in use therein were effectively meaningless, and worse when used in combination with other data sets with undeclared — and frequently different — CRS.)

nandana commented 5 years ago

@TallTed @p1d1d1 TL;DR - is there a way to (a) turn off this validation of geo:wktLiteral or (b) make Virtuoso continue parsing a file ignoring the erroneous triple?

Long version: I want to load the Wikidata dump to Virtuoso to do some performance benchmarks comparing it with other triplestores such as Blazegraph. This issue is blocking me from loading Wikidata into Virtuoso. As a quick hack (while ignoring all GeoSPARQL data for the moment), I have done a grep/sed to remove all CRS URIs from wktLiterals to see if I could load all data after that.

For example, converting the following

wd:Q2267142 wdt:P31 wd:Q1439394 ;
        wdt:P376 wd:Q3303 ;
        wdt:P2824 "2727" ;
        wdt:P625 "<http://www.wikidata.org/entity/Q3303> Point(-358.26 11.3)"^^geo:wktLiteral ;
        wdt:P2386 "+170"^^xsd:decimal ;
        p:P31 wds:Q2267142-B08676C6-584A-41A0-8B62-DB3F7CE635A1 .

to

wd:Q2267142 wdt:P31 wd:Q1439394 ;
        wdt:P376 wd:Q3303 ;
        wdt:P2824 "2727" ;
        wdt:P625 "Point(-358.26 11.3)"^^geo:wktLiteral ;
        wdt:P2386 "+170"^^xsd:decimal ;
        p:P31 wds:Q2267142-B08676C6-584A-41A0-8B62-DB3F7CE635A1 .

Nevertheless, now this data is interpreted against WGS84 and this is still failing as the coordinates are out of the bounds (i.e., -180). It seems there is no easy way out of this. Is there a way to load the rest of the data while loosing the GeoSPARQL data with other CRSs or any other alternative to load Wikidata?

TallTed commented 5 years ago

@nandana — I am not immediately aware of a way to switch off these data checks; @IvanMikhailov or @pkleef may have a suggestion.

That said — assuming that all the data you have to load is similar to the above, and that the CRS-including geo:wktLiteral values are predicate+value lines which end with semicolons, you could run a slightly different grep/sed to simply remove those entire lines, leaving only whitespace. If some of these lines are period-ended, or if they might be the first line of an entity's description — so the line starts with the entity's URI — the grep/sed gets more complex ... but still, I think, reasonably doable.

Thinking further, I might suggest you consider using some different dataset(s) and/or queries for your benchmarking, such as those developed by the LDBC, and written about in the old Virtuoso Blog among other places.

pfps commented 5 years ago

I have ruun into this problem as well, and patched VOS as described in https://community.openlinksw.com/t/non-terrestrial-geo-literals/359 This patch appears to be working find, but I'm not doing anything with the geo-literals so there may be hidden problems.

TallTed commented 5 years ago

@pkleef, @IvanMikhailov, @kidehen, @openlink -- By not properly fixing this issue such that Virtuoso supports any URI as a valid CRS URI (as the spec requires, as discussed in detail above, particularly at 1 and 2), possibly among other fixes, we are leading/causing people like @asanchez75 to disable a significant chunk of data validation code, and @pfps to comment out a smaller selection, which may cause significant problems down the line -- not only by preferring non-OpenLink-branches of VOS.

Virtuoso is a DBMS. Virtuoso should be managing the data people want to manage, not "protecting" people by refusing to manage data we don't like (i.e., geodata based on non-terrestrial or nonsensical or simply unknown CRS systems).

pfps commented 4 years ago

I've just run into this again. Is there a plan for finally fixing this bug in Virtuoso?

TallTed commented 3 years ago

@pkleef, @IvanMikhailov, @kidehen, @openlink, @HughWilliams --

Any update on this issue?

katarinarak commented 5 months ago

Hi, I just run into this issue and wonder why it is still not fixed?