opengeospatial / EDR-API-Sprint

Planning, work and final report of a virtual Hackathon/Sprint to progress EDR-API
4 stars 8 forks source link

Streaming of EDR API response media types #18

Closed cportele closed 4 years ago

cportele commented 4 years ago

I had a look at the CovJSON spec and how we would implement support for it in ldproxy. I was wondering one thing: How well can it be streamed? Or was that no design consideration and CovJSON documents are always meant to be small?

Originally posted by @cportele in https://github.com/opengeospatial/EDR-API-Sprint/issues/15#issuecomment-600782395

@m-burgoyne:

@cportele I don't think there is any support for streaming, but it does have the concept of tiling NDimensional data https://covjson.org/spec/#tiledndarray-objects

@cportele:

@m-burgoyne - Thanks for the clarification. So since tiling will only really work for results that can be cached and you are returning CovJSON that is created on-the-fly in your prototype (I think), is my conclusion correct that CovJSON is only meant for smallish payloads? I am asking, because I was expecting that EDR API responses could also sometimes become "big" (mega bytes).

@m-burgoyne:

@cportele, yes CoverageJSON should only be for small payloads (as is true of any text based format). The EDR candidate specification does allow the publisher to advertise other output formats and has a specific HTTP error code to inform the user their request will payload is too large too process. One of the aims of the EDR format is to encourage users to ask for just the data they need, although the Cube (and many Polygon queries) will often not be able to use CoverageJSON due to size limitations. The candidate specification also has an approach to asynchronous so there is also a mechanism to tell the user they will need to wait while the tiles are generated.

@chris-little:

@cportele And the definition of 'small' for CoverageJSON is that it can be downloaded into a browser for local, client-side, processing, such as innovative visualisation.

@cportele:

@chris-little - Also in the use case streaming is useful for the user experience.

@chris-little:

@cportele - well not really. Streaming usually implies some chunks, such as images or sound bites, can be dropped and only the latest in the stream matters. In general, this is not true for a lot of environmental data - it has to be lossless and complete, otherwise the science is screwed.

@dblodgett-usgs:

I think the streaming @cportele is referring to is in the sense of a database query -- Can the format be streamed out before the query has completely returned such that the client-server connection can open nearly synchronously? Or does the format require complete serialization on the server before starting the stream? e.g. xml with metadata a la TimeseriesML vs a zipped package.

@cportele:

@dblodgett-usgs - exactly, thanks.

@chris-little - I think most implementations of OGC API Features use streaming when they return features, typically a feature per chunk. Streaming is not just video, etc. That way the client can start to do something with the incoming features, before the query on the server is finished. This is not really important when you request ten features, but that changes if you request a few thousand or more.

@m-burgoyne:

In theory you could use the https://covjson.org/spec/#tiledndarray-objects approach to achieve a similar resullt.

@cportele:

@m-burgoyne - But that would require that the tiles could be all in the same document/response/JSON object, which currently doesn't seem to be the case?

@m-burgoyne:

@cportele, yes it would require client side logic as the parent document has to be loaded to identify the availble tiles (each of which is a seperate document).

@chris-little:

Streaming: can we say that the first version of EDR API will not do streaming?

@cportele:

I wouldn't say that in the spec.

In general it is a decision of an implementation, if it uses streaming or not. No implementation is required to support streaming.

In addition, it will depend on the encoding. If the EDR API returns a media type that can be streamed the such a statement would be wrong, too.

And looking at the draft spec, it lists a GeoJSON conformance class. It is not really clear to me how GeoJSON is returned where after a quick search, but where it is used, streaming is possible.

chris-little commented 4 years ago

Issue well explored and useful recommendations adopted in Environmental Data Retrieval API repo