[FeatureRequest] Add GeoJSON format for selected outputs

opengeospatial / sensorthings

The official web site of the OGC SensorThings API standard specification.

134 stars 29 forks source link

[FeatureRequest] Add GeoJSON format for selected outputs #70

Open rduivenvoorde opened 5 years ago

rduivenvoorde commented 5 years ago

While the geometries in the json output are valid geojson parts, it is not possible to use the geojson as is in GIS applications (like QGIS which can handle/update geojson as layers) or a site like http://geojson.io/#map=2/20.0/0.0 where you can just paste geojson and see it on a map.

An very practical example would be to give make it possible for observerations to be retrieved as geojson where it would be very practical so receive the latest locations PLUS value and time properties as valid geojson. This would make it a breeze to view latest observations as layer in QGIS (viewing the values as labels and styling the points based on values) or in a Leaflet or OpenLayers application.

I think a true mapping of the Sensorthings JSON to geojson would not be easy, as geojson is a flat (classic GIS) table like structure. So only the appropriate parts (Observations or FeaturesOfInterest)?

See http://geojson.org/ and https://tools.ietf.org/html/rfc7946

KathiSchleidt commented 5 years ago

Do I understand you correctly that what you're proposing is switching the oder of element within the FoI so the feature part becomes the core (GeoJSON) object, the other elements become properties of this object? This could be a useful view on the FoI. As for adding Observations, this would then be included via extend, added to the properties of the GeoJSON Object However, I do not recall a requirement for the contents of the GeoJSON properties to be flat. Yes, the properties are a flat list of key/value pairs, but is there a constraint I've missed in GeoJSON that the value cannot in turn be an object (i.e. the Observation?)

taniakhalafbeigi commented 5 years ago

I personally agree that having Observations together with FeatureOfInterest in GeoJSON format could be very valuable. But we need to further discuss about it in SWG as it has its own complexity. For example, if you want to be able to use it to show a layer on map, the GeoJSON properties either should have the latest Observation for each FOI, or it should basically store the timeseries which can be long. On the other hand, multiple Sensors might be observing the same FOI, which means we will have more than one "latest Observation for FOI" and we also need to somehow distinguish between them. One way to overcome this issue is to attach the ObservedProperty and/or Sensor to that Observation in the GeoJSON properties. The important thing here is to make sure that what we agree upon can work and make sense for most IoT use cases. I personally think this could be a valuable discussion, because if with the help of SWG member we can can find a way to have something like dataArray, which is $format=GeoJSON, that can be used as a GeoJSON map layer, that will be, I think, very valuable.

rduivenvoorde commented 5 years ago

@KathiSchleidt you are right, see https://tools.ietf.org/html/rfc7946#section-3.2 property-values of a Feature in GeoJSON can be Objects (though in current GIS implementations you would never be able to 'see' these, unless you add some kind of 'interpretation' over this). In all implementation/uses of GeoJSON I know values are strings or numbers. The spec is broader then the use...

I'm pretty new in the ST-scene, and to me it's not always clear on how to model reality into ST... For example: I'm talking here about a car driving around with sensors on it. What I actually want is periodically create a (time)stamp to create an Observation in Time AND(!) Place. But if I create observations (measurements) with a FOI-object, they do not end up as Locations for the Thing (car). What I want to see on the map is the exact locations of the measurements with the timestamp.

So... I think(?) that at least to me there is no necessity to do a full ST-json -> GeoJSON mapping. As it is already JSON. Mapping the Observations+FOI's to GeoJSON features would be enough.... Mmm, on second thought, and trying to create an example feature here... @taniakhalafbeigi you are right, it is complex, but let us try:

{ "type":"FeatureCollection",
  "name": "vehicle1",
  "crs":{"type":"name","properties":{"name":"urn:ogc:def:crs:OGC:1.3:CRS84"} },
  "link":"http_with_query_to_retrieve_this_resultset"  // Not GeoJSON ?
  "features": [
   {"type":"Feature",
    "properties":{
          "value":53,          // from Observation
          "uom": "mSv/h",  // coming from DataStream  ? or leave it out, or the uri?
          "link": "http_url_to_FOI_or_Observation ?"
          "phenomenonTime": "2019-02-06T19:52:20.732Z",   // Observation
          "resultTimeTime": "2019-02-06T19:52:20.732Z",  // Observation
     },
     "geometry":   // from Observation, either the FOI or the Location?
              {"type":"Point","coordinates":[3.73950798498,51.4383757488] } }
 ]
}

So it looks like GeoJSON features would be some mix of DataStream, Observation and FOI?

So a TimeSerie could be a timeserie of one DataStream or from several DataStreams (Sensors)? Every timestamp/location would have a feature (and yes could be long, but that's geojson :-) )

And the 'Latest Observations' could also be the latest Observations of all DataStream mapped to Geojson just like you now receive v1.0/Observations ?

Anyway, I'm very interested :-) Thanks for discussing this.

hylkevds commented 5 years ago

For example: I'm talking here about a car driving around with sensors on it. What I actually want is periodically create a (time)stamp to create an Observation in Time AND(!) Place. But if I create observations (measurements) with a FOI-object, they do not end up as Locations for the Thing (car).

That is because the location of the Thing does not have to be the location of the Observation. To change the location of the Car, you have to update the Locations property of the Car Thing. If you first update the location of the Car, and then insert a new Observation in a Datastream of the Car, without specifying a FoI, a new FoI will be generated pointing to the current location of the Car. The history of the Locations of the car is logged in HistoricalLocations.

Examples are great, makes discussion a lot easier.

"name": "vehicle1"

Does that come from the Thing?

The mapping would have to be very flexible, since there are many use cases that would want to see different things in the GeoJSON. Just some use cases I can quickly come up with:

The latest Observations for all datastreams of a Thing
The latest Observations for all ObservedProperties of a (STA) FoI
The latest Observations for an ObservedProperty (at multiple locations: PointCollection?)
The latest Observations for all ObservedProperties
The path of a moving Thing
The path of a moving Thing, with observations at each location (if any)

Coming up with a query mechanism that would cover all likely use cases is going to be... challenging :)

taniakhalafbeigi commented 5 years ago

@rduivenvoorde, about working with Location of the car, Hylke is right, and you need to update the Location separately. But do you even need the Location? For your use case keeping the position information in FOI should be enough. Specially because it seems that you are not interested in pure location data and you want to see the measurement Observations for that too. And for that you can expand FOI for Observations and use the result and phenomenonTime together with location information from Observation's FOI.

I think, since the purpose of this GeoJSON is to show as a layer on the map, we may be able to come up with some solution. I personally think the most important challenge is different ObservedProperties. I think what we can do is this:

The GeoJSON will have all the FOIs in the system, with their latest recorded Observations as feature properties. If multiple ObservedProperties are observed for a FOI, we will have the latest recorded Observation for each of those ObservedProperties.

This way, we can show the latest state of the server data on the map. If people want to see only one ObservedProperty then client can handle it. To show moving Objects, the map needs to retrieve the layer by sending the request to the system periodically. It does not cover showing the historical path of a moving Thing though. We can always embed the Thing info/id in the properties for each latest Observation as well so that clients can have the option to filter based on that (on client-side). To me we can choose the latest state of the server data use case which is very common and try to design on top of that. But of course, IoT domain experts can help up validate this use case.

Here is the example:

{ 
  "type":"FeatureCollection",
  "features": [{
    "type":"Feature",
    "properties":{
      "values":[{
        "result":53,
        "phenomenonTime": "2019-02-06T19:52:20.732Z",
        "resultTimeTime": "2019-02-06T19:52:20.732Z",
        "ObservedProperty@iot.selfLink" : "http://example.com/v1.0/ObservedProperties(1)",
        "Sensor@iot.selfLink" : "http://example.com/v1.0/Sensors(2)",
        "Thing@iot.selfLink": "http://example.com/v1.0/Things(3)"
       },{
        //Latest Observation of another ObservedProperty for that FOI
        "result":10,
        "phenomenonTime": "2019-02-06T19:52:20.500Z",
        "resultTimeTime": "2019-02-06T19:52:20.500Z",
        "ObservedProperty@iot.selfLink" : "http://example.com/v1.0/ObservedProperties(4)",
        "Sensor@iot.selfLink" : "http://example.com/v1.0/Sensors(5)",
        "Thing@iot.selfLink": "http://example.com/v1.0/Things(3)"
       }]
     },
     "geometry":
     {
       "type":"Point",
       "coordinates": [3.73950798498 , 51.4383757488]
     } 
    }
  ]
}

For this example I think we can also add expanded information for Thing, Sensor, and ObservedProperty, so that the client would not need further requests to the server. But it makes the message size bigger and there is a trade-off for that.

What I can say is that basically Datastream groups Observations with same Sensor and ObservedProperty for a given Thing. While this GeoJSON groups Observations with same Sensor and ObservedProperty that are observing the same FOI. And since this FOI view is mostly useful for showing on the map we format it as GeoJSON to be consumable by the map.

What do you guys think about this example? And also further narrowing the use case for this?

hylkevds commented 5 years ago

Interesting line if thought.

Specifying that the server should return "everything" is problematic. There would have to be some way to filter. The dataset we're currently working on, in the surface water quality domain, has 5205 Observed Properties, and 1382018 FeaturesOfInterest. And that is without any moving Things.

Just to focus on the single use-case of querying for FoIs, with the latest Observation for each ObservedProperty, since that is the hardest one. Filtering the FoI's is trivial, since we're querying the FoIs, so we can use a geospatial filter there. But can we add a nextLink in the FeatureCollection Object, or would that violate the GeoJSON spec? Alternatively, we could make the nextLink a HTTP header.

If, for a given FoI, there are too many ObservedProperties, we can add a nextLink there, but if there user is only interested in a single ObservedProperty, returning thousands, and expecting the user to do the filtering, is not going to make happy users...

That means we'd need some way to

filter which ObservedProperties the user is interested in and
a way for the server to say that the result only contains a subset of the ObservedProperties & FoIs, because there are too many (server-sided pagination, with a nextLink of some kind)

Still, getting the latest Observation for each ObservedProperty, for a given FeatureOfInterest can be very computationally expensive with the current data model. One could check which Datastreams have data for a given FeatureOfInterest, but such a Datastream may have data for multiple FeaturesOfInterest. There may also be multiple Datastreams with the same ObservedProperty (with different Sensors).

Of course, getting the latest Observation for each ObservedProperty, for a given FeatureOfInterest is something that is not just interesting for a GeoJSON output. It's also a very interesting query for the normal SensorThings JSON output!

rduivenvoorde commented 5 years ago

@taniakhalafbeigi your are probably right, I have that insight now: I'm actually not interested in the position of car, but solely on the position of the Observation (so FOI).

@hylkevds if sending 'latest observation' of every DataStream on a server is a problem, depends on your usecase. The one I have is a national (about 200) (or european, about 6000) nodes measuring network. So both would be ok qua size (unless the geojson is really getting too big). My other practical usecase is indeed the car measuring during a ride (so 1 or just a few DataStreams from one Thing), but I would like to see All Observations because showing the values in a Map would be a valuable 'overview' of where the 'source' of a phenomenon is...

I was also hoping that by using the usual 'select' clauses we would be able to restrict what we would retrieve (in size) (I'd prefer not to page personally)

Talking about sizes of data/geojson. Today I hit on the socalled 'geojsonseq' specification: https://www.gdal.org/drv_geojsonseq.html and https://www.interline.io/blog/geojsonl-extracts/ etc. in which you do not create a FeatureCollection, but 'stream' only individual Features separated by newlines. Could be a nice output format too!!!

hylkevds commented 4 years ago

Since it became relevant for the API4INSPIRE project, I've implemented and typed up a prototype GeoJSON extension for FROST: https://fraunhoferiosb.github.io/FROST-Server/extensions/GeoJSON-ResultFormat.html

Since it is a true resultFormat (it only reformats the result of existing queries) you can make the query and result as complex as you like: https://airquality-frost.docker01.ilt-dmz.iosb.fraunhofer.de/v1.1/Things?$orderby=id%20asc&$expand=Locations($select=name,id,location),Datastreams($select=name,id;$expand=ObservedProperty($select=name),Observations($select=phenomenonTime,result;$orderby=phenomenonTime%20desc;$top=5))&$top=100&$resultFormat=geojson

Of course there is no nextLink, so paging has to be done "manually"

Please try it out, I'd love to get some feedback!

KathiSchleidt commented 4 years ago

I'm still chewing a bit on the array indexes slipped into the attribute names, but I do believe they help more than they hurt (if you do simple cases limited by $top=1 then you just have to cope with extraneous /0 in your path) I VERY MUCH do like the fact that we now have a clean GeoJSON format to offer (and if you do $top it down, you can even display via a trivial viewer :) )

Stupid question - why doesn't count and nextLink work? $skip seems to work, what am I missing?

sgrellet commented 4 years ago

Happy to see that discussions in API4INSPIRE lead to improvements to ST API :)

Feebacks based on this example : https://airquality-frost.docker01.ilt-dmz.iosb.fraunhofer.de/v1.1/Things(1)?$expand=Datastreams With the output GeoJSON we then have "Datastreams/0/id" being https://airquality-frost.docker01.ilt-dmz.iosb.fraunhofer.de/v1.1/Datastreams(162) "Datastreams/1/id" being https://airquality-frost.docker01.ilt-dmz.iosb.fraunhofer.de/v1.1/Datastreams(593) Also keeping in mind, this rationality might propagate to the CSV resultFormat (#51).

having the position of the index in the array is a bit disturbing at first (~ 30 seconds and we are accustomed to STAPI) From an enterprise architecture point of view those elements (DataStream, Observation) may have a external id (URI). We already live with the fact that ST API uses its own internal index (here 162, 593) which is fine (historical differention between external identifier and internal id) But here, we add another one. So that may be confusing even for devs and other users (especially if this propagates to CSV) Would that be too costly at runtime to use ST API "@iot.id" ?
adding some links (at least selfLinks) : Otherwise there is no way to traverse back to the API and loose the HATEOAS Adding the self links of the referred instan won't break the serialization (GeoJSON -> JSON) if we force the $resultFormat=geojson in the links
would be great to formalize it the way we did for CSV output

hylkevds commented 4 years ago

The array index is not an ID, it is an array index. It is essentially exactly the same as the normal JSON output of STA. The first one is always 0, and they are always an uninterrupted sequence in each Feature. The reason to use these instead of the ID is that with an array index one doesn't need fancy parsing of the property names to be able to read the properties. If you use $top=1, you don't even need to use code to generate the property names you want to read, since they are fixed and always the same. If we'd used the @iot.id, it would be impossible to access "the first Datastream", since you wouldn't know which property name to use. Also, object properties in JSON are not ordered (!) so you would lose any ordering you did in the query. This can not propagate to CSV, since it is impossible to have expands with multiple cardinality in the CSV result format. That would cause each row to have a different number of columns. It would also explode the column count.
selfLinks should be no problem, those can be added. Even selfLinks that point to geoJson, though they are very likely to return Features with no geometry, so it's debatable what is better. Selflinks with the GeoJSON resultFormat, or without.
Count and nextLinks for the top collection would go into the FeatureCollection, but a FeatureCollection has no properties... Do we extend GeoJSON and add properties to the FeatureCollection type? Count and nextLinks for expanded collections could be added, but what should these nextLinks point to? When following STA convention, the resulting GeoJSON probably has no geometry. It would also turn the Datastreams in your example above into the Features, whereas in the original document they are flattened in the properties of each Thing Feature
Formalisation is happening here: https://airquality-frost.docker01.ilt-dmz.iosb.fraunhofer.de/v1.1/Things(1)?$expand=Datastreams
I just noticed that GeoJSON Features can have an "id" field, so the @iot.id of each Feature should go there, not in the properties.

rduivenvoorde commented 3 years ago

Supercool this work. Thanks! Working in QGIS!!

Have to find the right 'query' to retrieve individual 'features' (POI?) for every time step (so every feature would have one time and one value property, and the geometry either an identical point (in case of a static post) or a unique point (in case of a driving Thing). The QGIS Temporal filtering controller expects a set of features all with the same key/value pair AND on geom in one layer/set. So you can filter a layer and only show values for a certain frame in a map. This in contrast with the example above: https://airquality-frost.k8s.ilt-dmz.iosb.fraunhofer.de/v1.1/Things?$orderby=id%20asc&$expand=Locations($select=name,id,location),Datastreams($select=name,id;$expand=ObservedProperty($select=name),Observations($select=phenomenonTime,result;$orderby=phenomenonTime%20desc;$top=5))&$top=100&$resultFormat=geojson which more is suited to collect a set of measurements for one sensor to show a line graph or so.

About the count and next links: I read https://datatracker.ietf.org/doc/html/rfc7946#section-6.1 as that it is OK to add a nextLink or iot_nextLink member to a FeatureCollection. It would writing clients maybe easier to page through a dataset?