opengeospatial / geosemantics-dwg

6 stars 13 forks source link

geojson-ld, jena and converting to RDF from QGIS Desktop #101

Open pvgenuchten opened 3 years ago

pvgenuchten commented 3 years ago

This issue covers some of the discussion in the gitter channel of the sprint

The typical use case raised by the jena team is an ability to convert spatial data to rdf using a tool like qgis or geoserver and push it to fuseki. They currently use an approach with CSV containing wkt.

An alternative would be the use of GeoJSON-ld. GeoJSON-ld has long struggled with a limitation in json-ld to have exted arrays. This limitation has been resolved in json-ld spec 1.1, however from the documentation it is not clear if current jena support the 1.0 or 1.1 json-ld specification. My impression is that 1.1 is not supported.

GeoJSON-ld is a very likely encoding of OGC API Features & Records output, some implementations are already available, for example at https://demo.pygeoapi.io/master/collections/lakes/items?f=jsonld

An alternative approach here is to use SPARQL to query an OGC API Features endpoint (in case it supports json-ld). You can use the ARQ tool which is included in the jena-release to query a json-ld document (and all documents linked from that document). This case brings in some suggestions to a JSON-ld output encoding of OGC API's, this case would benefit from bidirectional links between collections and items.

neumarcx commented 3 years ago

I am going to look at our JSON-ld / GeoJSON-ld in fuseki. would like to discuss how the pipeline works. I presume there is no spatial evaluation happening on the OGC Features endpoint. you just pull the features you need directly?

pvgenuchten commented 3 years ago

I am going to look at our JSON-ld / GeoJSON-ld in fuseki

Thanx!

no spatial evaluation happening on the OGC Features endpoint

What spatial evolution do you hope/aim for?

you just pull the features you need directly

As a client i can filter the items (features/records) required for example by a bounds object or attribute filter from the selected collection (table) via the server application, but i'm not sure if this is what you mean. Maybe a bit of background:

As a big warning, traditional geo data is not triple oriented, so any approach to create triples from traditional geo data is likely to have caveats or assumptions for specific use cases.

In Geo we have a long history of server products such as geoserver, mapserver, qgis server, pygeoapi providing webservices based on OGC specs on configurable backends, such as postgres, oracle, elastic, sqlite, etc The data exchanged is in 95% of the cases flat tables with a geometry column.

These services are then consumed by browser libraries such as openlayers, leaflet or desktop applications such as qgis, arcgis desktop, or Processing, BI or ETL applications

Previous OGC standards were quite isolated from the rest of the web, but the new generation of OGC API's are closer to web standards, which makes items discoverable by search engines and its uri's queryable with sparql. Which is a driver for the introduction of rdf encodings and adoption of common ontologies.

Since the OGC API's are currently quite JSON oriented, and JSON-ld is relatively easy to add to pre-existing json responses, json-ld seems a good match with the OGC API's.

neumarcx commented 3 years ago

I am going to look at our JSON-ld / GeoJSON-ld in fuseki

Thanx!

no spatial evaluation happening on the OGC Features endpoint

What spatial evolution do you hope/aim for?

not looking for any, but is there a QL for geoserver or just filters?

you just pull the features you need directly

As a client i can filter the items (features/records) required for example by a bounds object or attribute filter from the selected collection (table) via the server application, but i'm not sure if this is what you mean. Maybe a bit of background:

As a big warning, traditional geo data is not triple oriented, so any approach to create triples from traditional geo data is likely to have caveats or assumptions for specific use cases.

In Geo we have a long history of server products such as geoserver, mapserver, qgis server, pygeoapi providing webservices based on OGC specs on configurable backends, such as postgres, oracle, elastic, sqlite, etc The data exchanged is in 95% of the cases flat tables with a geometry column.

These services are then consumed by browser libraries such as openlayers, leaflet or desktop applications such as qgis, arcgis desktop, or Processing, BI or ETL applications

Previous OGC standards were quite isolated from the rest of the web, but the new generation of OGC API's are closer to web standards, which makes items discoverable by search engines and its uri's queryable with sparql. Which is a driver for the introduction of rdf encodings and adoption of common ontologies.

Since the OGC API's are currently quite JSON oriented, and JSON-ld is relatively easy to add to pre-existing json responses, json-ld seems a good match with the OGC API's.

I will take that into account in my pipeline. I presume we will have to preferably process geo-json-ld in jena.

Can qgis directly write features into a geoserver?

cportele commented 3 years ago

@neumarcx - If you want to play with data from an existing OGC API deployment, here is another experimental API that implements OGC API Features and returns GeoJSON with JSON-LD annotations (referencing the GeoJSON and SOSA vocabularies/ontologies):

A sample request: https://t16.ldproxy.net/ghcnd/collections/observation/items?f=json&locationName=KOLN-BONN&limit=100

This is the normal GeoJSON response of an API implementing OGC API Features, but with added context / annotations for those that are interested in it. For those that do not care, it is valid GeoJSON and they will ignore the JSON-LD bits.

Sorry, also no (Geo)SPARQL...

neumarcx commented 3 years ago

@neumarcx - If you want to play with data from an existing OGC API deployment, here is another experimental API that implements OGC API Features and returns GeoJSON with JSON-LD annotations (referencing the GeoJSON and SOSA vocabularies/ontologies):

A sample request: https://t16.ldproxy.net/ghcnd/collections/observation/items?f=json&locationName=KOLN-BONN&limit=100

This is the normal GeoJSON response of an API implementing OGC API Features, but with added context / annotations for those that are interested in it. For those that do not care, it is valid GeoJSON and they will ignore the JSON-LD bits.

Sorry, also no (Geo)SPARQL...

OK yes I see no geojson-ld support yet in geoserver. so looks like I have to look at the export module in QGIS for this sprint.

but in the future this pipeline might be attractive: QGIS > GeoServer > Fuseki

pvgenuchten commented 3 years ago

is there a QL for geoserver or just filters?

CQL is an extension to current OGC API, but it is also quite common on WFS implementations. WFS itself implements a quite extended dedicated query language

Can qgis directly write features into a geoserver?

Yes, it can via WFS-Transactional, but i'm not sure this is most optimal. Maybe better let QGIS write records to a postgres or geopackage (= sqlite) table and let GeoServer (or QGIS server) expose that table as OGCAPI.

Or alternatively, export a dataset from qgis as geojson, prepend the geojson-ld context and ingest as geojson-ld

pvgenuchten commented 3 years ago

no geojson-ld support yet in geoserver

There is a community module in geoserver which adds geojson-ld support

pygeoapi also has geojson-ld support A sample implementation is here https://demo.pygeoapi.io/master/collections/obs/items?id=238&f=jsonld

neumarcx commented 3 years ago

f=jsonld

ah I see the above threw me off.

pvgenuchten commented 3 years ago

please advice on this, we would usually use content negotiation using accept:application/ld+json, but then if you want to override that and require explicit json-ld in your browser, we use f=jsonld. Do you have an alternative suggestion? e.g. &mime=application/ld+json (plus-char is really awkward in url's).

neumarcx commented 3 years ago

since there is a difference in syntax I was expecting GeoJSON-LD in the URL encoding. but sure the mime-type is the same.

On another note I did the test with the most recent fuseki release and the GeoJSON-LD loader and it seems work fine. So we have a working pipeline (QGIS>GeoServer>Fuseki) now in theory. I wonder if we should elaborate more on this setup in the coming days. I definitely would like to hear a little more about the GeoServer project and the OGC API. But I also would like to have a look at the plug-in development for QGIS for direct exports or a new GeoServer Community module, to give the user an option to select .ttl or .jsonld for exports.

neumarcx commented 3 years ago

@pvgenuchten looking at the qgis to geoserver bridge. Has the GeoCatBridge replace the Geoserver Explorer plugin? Is GeoCatBridge now the recommended way to publish QGIS layers with Geoserver?

pvgenuchten commented 3 years ago

Correct, but consider that bridge manages a quite specific use case. to publish data+layer+metadata mostly outside the organisation, it may be a bit overkill for this pipeline. If you manage geoserver and qgis on the same machine/domain, it may make more sense to connect both systems to the same database, let qgis store its data on that database and then let geoserver expose it as json-ld.

neumarcx commented 3 years ago

to connect both systems to the same database, let qgis store its data on that database and then let geoserver expose it as json-ld.

I got the following error during plugin installation on a x64win machine for the geocatbridge plugin:

Python error: Couldn't load plugin 'geocatbridge' due to an error when calling its classFactory() method See message log (Python Error) for more details.

Couldn't load plugin 'geocatbridge' due to an error when calling its classFactory() method ModuleNotFoundError: No module named 'lxml.etree'

I believe this is related to

https://github.com/GeoCat/qgis-bridge-plugin/issues/1

has this been fixed? is there a doc how to install the lib on x64win?

pvgenuchten commented 3 years ago

Plz check https://issues.qgis.org/issues/11536

pvgenuchten commented 3 years ago

Curious to hear qgis folks about this, qgis has a database database-migration plugin; "db manager", I wonder if that could be extended to include support for fuseki as a target platform

image

db manager is very relational database oriented, so it may be challenging

situx commented 3 years ago

Not sure if this is related to what you are planning, but it reads like our SPARQL Plugin for QGIS https://github.com/sparqlunicorn/sparqlunicornGoesGIS is attempting something similar. You can issue a SPARQL Query in QGIS to an arbitrary SPARQL endpoint and the result is converted to a QGIS vector layer. You may also convert a QGIS vector layer to TTL encoded in the GeoSPARQL vocabulary or create a more elaborate interlinking to other vocabularies. So you could query a QGIS layer with SPARQL and export a TTL file that you could upload in Fuseki. In the latest release, you may even upload a converted dataset to a SPARQL endpoint directly. If this fits your needs or if it does not we are happy for change requests.

neumarcx commented 3 years ago

this looks very interesting @situx and we might be able to use this in our pipeline.

It's quite a long description. I will look at it over the weekend in more detail.

situx commented 3 years ago

Thanks, @neumarcx feel free to open any issues you might have with our tool in our Github. The main input we would need is user feedback and ideas to make the user interface more friendly for beginners and also for advanced users.