Open SiBell opened 4 years ago
I like the idea of using the term Collections so that it's in keeping with an ObservationCollection.
I also like the idea of having a list of collections available from the entry point.
With regards to the pagination. Here's my crack at an example and we can tweak/dismiss it if required.
So the user goes to the entry point: https://api.urbanobservatory.com/
and is presented with the following JSON response:
{
"@context": {
"@base": "https://api.urbanobservatory.com/",
"uo": "https://urbanobservatory.github.io/standards/vocabulary/latest/",
"sosa": "http://www.w3.org/ns/sosa/",
"title": "http://purl.org/dc/terms/title",
"collections": {
"@id": "uo:EntrypointCollections",
"@container": "@id"
}
},
"collections": {
"/sensors": {
"@type": ["@id", "uo:Collection", "uo:SensorCollection"],
"title": "All sensors available in Newcastle upon Tyne"
},
"/observations": {
"@type": ["@id", "uo:Collection", "sosa:ObservationCollection"],
"title": "All the observations collected by the urban observatory"
}
}
}
The user then follows the link to the observations collection and is presented with the following:
{
"@context": {
"@base": "https://api.urbanobservatory.com/",
"uo": "https://urbanobservatory.github.io/standards/vocabulary/latest/",
"sosa": "http://www.w3.org/ns/sosa/",
"totalItems": "https://www.hydra-cg.com/spec/latest/core/#hydra:totalItems",
"member": "https://www.hydra-cg.com/spec/latest/core/#hydra:member"
"view": "https://www.hydra-cg.com/spec/latest/core/#hydra:view"
},
"@id": "https://api.urbanobservatory.com/observations?offset=0&limit=100&sortBy=resultTime",
"@type": ["@id", "uo:Collection", "sosa:ObservationCollection"]
"totalItems": "4980",
"member": [
{
"madeBySensor": "thermistor-37f3kd"
"resultTime": "2020-01-27T14:28:18.393Z",
"hasResult": {
"value": "22.9"
}
},
{etc, etc}
],
"view": {
"@id": "https://api.urbanobservatory.com/observations?offset=0&limit=100&sortBy=resultTime",
"@type": "PartialCollectionView",
"next": "/observations?offset=100&limit=100&sortBy=resultTime",
}
}
And then if you follow the next link you'll end up with:
{
"@context": {
"@base": "https://api.urbanobservatory.com/",
"uo": "https://urbanobservatory.github.io/standards/vocabulary/latest/",
"sosa": "http://www.w3.org/ns/sosa/",
"totalItems": "https://www.hydra-cg.com/spec/latest/core/#hydra:totalItems",
"member": "https://www.hydra-cg.com/spec/latest/core/#hydra:member",
"view": "https://www.hydra-cg.com/spec/latest/core/#hydra:view"
},
"@id": "https://api.urbanobservatory.com/observations?offset=0&limit=100&sortBy=resultTime",
"@type": ["@id", "uo:Collection", "sosa:ObservationCollection"]
"totalItems": "4980",
"member": [
{
"madeBySensor": "hygrometer-234fs"
"resultTime": "2020-01-27T15:28:18.393Z",
"hasResult": {
"value": "82.3"
}
},
{etc, etc}
],
"view": {
"@id": "https://api.urbanobservatory.com/observations?offset=100&limit=100&sortBy=resultTime",
"@type": "PartialCollectionView",
"previous": "/observations?offset=0&limit=100&sortBy=resultTime"
"next": "/observations?offset=200&limit=100&sortBy=resultTime",
}
}
I personally prefer the term links, as used by JSON:API, for holding the next and previous links, but view is ok if we want to stick with hydra's terminology.
I've shown examples here with offset, limit and sortBy, e.g. ?offset=0&limit=100&sortBy=resultTime
, but individual observatories may wish to paginate in a slightly different way if it's more performant for them, e.g. ?page=2
.
Is it potentially a pain for end-users if we only show partial URI's e.g. /observations
rather than https://api.urbanobservatory.com/observations
, as I'm guessing some browsers will let the user click on complete links and go straight to them.
Guessing we don't need to have any special HTTP headers, e.g. as described here, if we're handling the next and prev links in the JSON response?
I also wonder if there's a way of preventing common share properties from being repeated. For example if all members of the collection share exactly the same madeBySensor
or inDeployment
property is there a way of only including it once. I was hoping the ObservationCollection docs would give an example, but they don't.
My strong preference for pagination is to avoid using JSON-LD for next/prev links. The problem with this is how do you describe how to jump to a specific page, or searching of the collection.
This is what I believe JSON Schema should be used for, because it has more flexibility, like defining validation on query parameters.
Based IRIs shouldn't be an issue if the elements are expanded in code first, using the JSON-LD algorithms. This is something the library I've been working does automatically.
So we have a meta
object instead? As in your example here.
And the user can look at the schema for more details on the pagination properties? E.g. what the maximum value for the limit
can be.
Yeah, it doesn't have to be a meta
object, it could be anything really, but the schema would reference an element in the document using a JSON pointer, #/meta/current
for example. The templatePointers
in this bit are an example.
I admit I don't know much about JSON:API though, so there might be another way. The one other thing in JSON Schema's favour though is that it is now fully aligned with OpenAPI (as of a few weeks back).
Great to hear they're aligned.
I'm struggling a little to see how we'll code this up in practise. Are we nearing a point where we could create a really basic Node.js application that serves some dummy observatory data using the approaches discussed?
Guessing it will have the following:
This blog post introduces a few libraries that may help.
Probably worth ensuring that any solution we decide upon can also handle a cursor-based approach rather than just an offset-based approach. Comparison of the two approaches here.
Ok what do we think of this as an approach. A user makes the following request for observations:
GET https://api.urbanobservatory.ac.uk/observations?madeBySensor=thermometer-6A7
To which they get the following back:
{
"@context": [
"https://api.urbanobservatory.ac.uk/context/collection.jsonld",
"https://api.urbanobservatory.ac.uk/context/observation.jsonld"
],
"@id": "https://api.urbanobservatory.ac.uk/observations?madeBySensor=thermometer-6A7",
"@type": [
"Collection"
],
"member": [
{"@id": "observation-1002500", "etc": "etc"},
{"@id": "observation-1002499", "etc": "etc"}
.
.
{"@id": "observation-1002401", "etc": "etc"}
],
"meta": {
"current": {
"@id": "https://api.urbanobservatory.ac.uk/observations?madeBySensor=thermometer-6A7&sortBy=resultTime&sortOrder=desc&resultTime__lte=2020-03-20T16:42:55.033Z&offset=0&limit=100",
"madeBySensor": "thermometer-6A7",
"sortBy": "resultTime",
"sortOrder": "desc",
"resultTime": {
"lte": "2020-03-20T16:42:55.033Z"
},
"offset": 0,
"limit": 100
},
"next": {
"@id": "https://api.urbanobservatory.ac.uk/observations?madeBySensor=thermometer-6A7&sortBy=resultTime&sortOrder=desc&resultTime__lte=2020-03-20T16:42:55.033Z&offset=100&limit=100",
"madeBySensor": "thermometer-6A7",
"sortBy": "resultTime",
"sortOrder": "desc",
"resultTime": {
"lte": "2020-03-20T16:42:55.033Z"
},
"offset": 100,
"limit": 100
},
"count": 100,
"total": 18456
}
}
Key points
current
and next
not only contain the links, but also detail the parameters used to construct the link. Having these parameters easily accessible can be useful to frontend applications. For example if a user clicks a next button on the webpage the parameters may be added to the end of the URL in the browser's address bar."resultTime__lte": "2020-03-20T16:42:55.033Z"
as it would be difficult to define what the key resultTime__lte
means, whereas it's far easier to define what resultTime
and lte
mean.previous
object. We might also want to allow last
and first
objects.count
and total
properties detail how many items are in this collection, and how many items in total are available on the server-side respectively.This seems like a nice solution to me, although I wonder if I'm essentially replicating what JSON Schema/Hyper-Schema is supposed to achieve.
As far as my experience goes, this looks like a nice solution (it's better than most data endpoints, anyway.) You've put your finger on my reservations here:
This seems like a nice solution to me, although I wonder if I'm essentially replicating what JSON Schema/Hyper-Schema is supposed to achieve.
Surely it's re-inventing the wheel to invent a homebrew pagination system?
Also, I don't understand where there is an offset
and limit
parameter here? I have in mind the blog post from Slack where they contrast offets vs. cursors for iterating through large datasets.
Surely it's re-inventing the wheel to invent
Definitely worth avoiding this where possible. I'll raise this point on the technical call tomorrow and see what everyone thinks. Let us know if you wish to join @Joe-Heffer-Shef.
I don't understand where there is an offset and limit parameter here?
@lukessmith and I had a quick chat about this offline. Our conclusion being that there's use cases for either. If we do decide to adopt my approach above then there's no reason why we couldn't swap out the offset and limit properties for a cursor instead. However, we felt that when it came to requesting observations the offset, limit approach made more sense. Our worry with the cursor approach is that it could get rather complex to manage on the server/database side. The cursor approach relies on having a unique sequential column in your database table. Initially the resultTime sounds like an obvious choice for this, but then we'd get into issues when multiple observations occur at the same time. In which case do you use a sequential row index instead, but then if you do want the observations returned in chronological order, or perhaps ordered by madeBySensor then this becomes tricky.
The obvious downside with the offset, limit approach is that we have streams of data coming in all the time, and thus the starting point for our offset could be changing all the time. However, the following lines in my example provide a nice solution for this:
"resultTime": {
"lte": "2020-03-20T16:42:55.033Z"
},
And we can always add a note in our docs tellings users to be aware of duplicates when requesting paginated observations.
Yes, I'd like to attend the meeting tomorrow, please.
I can see that the difficulty solving this problem arises from the same sources as many other challenges in the observatories i.e. heterogeneous data sources and unknown/varied usage patterns.
Yep you've hit the nail on the head.
The call is at 11:00 tomorrow on zoom. Could you send me quick message via this contact form, so I can send you the zoom details. Alternatively drop Patricio Ortiz an email as he'll be on the call too (I assume you've met).
We need some agreement on how we manage
Collection
s (if we call them that... this might for example be a collection of platforms, thus paginated, as Si) that aren'tObservationCollection
s. Examples of how we do that would be eitherhydra:Collection
orrdf:Bag
.It's also entirely possible that you might have all your platforms in one API (a lamp post API, say) and all your sensors in another (an air quality API, say) and all your historic observations in another (an observation collection API, say) and they would all just link to each other.
We also need an entrypoint that directs clients to these collections as a starting point. In other words, when I hit
https://api.example.com
it gives me links to a collection of sensors, a collection of platforms, a collection of observations, etc. It wouldn't need to give me all of those necessarily, you might not have a collection of all observations from all sensors (which could be huge, but might be useful), you might only have collections of observations under each sensor.In theory, this would/could look something like...
Originally posted by @lukessmith in https://github.com/urbanobservatory/standards/issues/18#issuecomment-578498701