Open SiBell opened 5 years ago
Ok here's my stab at this. I've done everything but Pagination and Response Format. Let me know what you think.
inDeployment=weather-stations
.__
prefixes a modifier, e.g. dateTime__gt
.Modifiers include:
Used to filter the data temporally.
The dateTime must be in ISO8601 format.
Defaults to UTC unless specified otherwise.
None, 1 or 2 of the keys can be provided.
Returns error response if:
?dateTime__gte:2019-10-18
?dateTime__gt:2019-10-18T15:03:34.614Z
?dateTime__gt:2019-10-18T15:03:34.614+04
?dateTime__gte:2019-10-18&dateTime__lt:2019-10-24
Assumes UTC is being used.
hourOfDay uses 24H clock
Several of these can be used together
Check values fall within expected range, e.g.
?minuteOfHour=30
?hourOfDay=22
?dayOfWeek=2
(i.e. for Tuesday)
?dayOfMonth=2
?dayOfYear=301
?monthOfYear=11
?year=2019
Latitudes and longitudes are given in WGS84 datum.
Height is in meters above or below the WGS 84 reference ellipsoid (same as GSOJSON).
None, 1 or 2 height keys can be used in a request.
None, 1 or 2 latitude keys can be used in a request.
None or 2 longitude keys can be used in a request.
?latitude__gt=52
?longitude__gt=-8.5&longitude__lte=2
?height___lt=10
Allows the user to find all resources (e.g. Platforms, observations, etc) within a given distance of a point.
The proximityCentre is the centre given in the form longitude,latitude, height
. The height is optional. When height is given the filtered region turns from a circle into a sphere.
Longitude and latitude are in WGS84. Height is in metres.
proximityRadius is the distance from the proximityCentre
in metres.
Return error response if:
?proximityCentre=-1.9,52.2&proximityRadius=1000
?proximityCentre=-1.9,52.2,10&proximityRadius=1000
Depends on the resource.
Certain resources will be filterable by certain attributes.
If, for example, you needed to find people whose hair colour is brown then your request might look like this:
https://api.example.com/people?hairColour=brown
More examples:
?inDeployment=weather-stations
?isHostedBy=lamppost-101
?age=18
Depends on the resource.
Applies modifiers to keys that are specific to the resource being queried.
/people?age__lt=18
/observations?value__gte=30.5&observedProperty=air-temperature
For endpoints returning a collection of resources this parameter will limit the number of resources returned.
Can be used in combination with the sortBy and sortOrder keys to get just the "last n" or "first n" resources in the collection.
?limit=100
?limit=1&sortOrder=asc&sortBy=age
For endpoints returning a collection of resources this parameter will sort the resources returned.
Sorts both numerical fields and also strings (i.e. alphabetically).
Can be used in combination with the limit key to get just the "last n" or "first n" resources in the collection.
sortOrder defaults to asc if the sortBy key is provided without sortOrder.
?sort=desc
?limit=1&sortOrder=asc&sortBy=age
Good work Simon. I agree most of the things. There are couple of things I would add/specify.
MO, TU, WE, TH, FR, SA, SU
to be explicit on the days, some countries start the week with Sunday.?dayOfWeek=MO,WE,FR
?dayOfWeek=MO-FR
Additionally ranges and comma separated list should also apply for other filters that are numeric.
Here's what I think for pagination.
Used for pagination of results. Both are represented as integers and are not required parameters. If not specified it defaults to limit=10 and offset=0.
Can be used in combination with the sortBy and sortOrder keys.
?limit=100
?limit=10&offset=10
?limit=10&offset=10&sortOrder=asc&sortBy=age
Not completely sure if we should make limit compulsory when using offset or just use default limit=10 when not specified?
I agree with all of this, and also with Aare's comments on days of the week and ranges. Thanks both.
In the interests of making this more widely applicable, I think it would be worth nailing down what the general case is. Specifically:
inDeployment
example; do we require full IRIs for example, inDeployment=https://birmingham.uo.ac.uk/api/deployment/weather-stations
, or do we allow relative paths following RFC3986 as in JSON-LD? Personal preference is allow either, but this does add complexityAs an example:
{
"@type": "Sensor",
[...],
"madeObservation": {
"@type": "ObservationCollection",
"member": [{
"@type": "Observation",
"hasResult": {
"@type": "Temperature",
"value": 12.0
},
"resultTime": "2019-10-25T15:14:00Z"
}]
}
}
In the above example:
value
to apply a threshold, e.g. ?value__gte=10.0
?value
filter when querying against the IRI of the ObservationCollection
?query
filter on a Sensor
or SensorCollection
, which contained ObservationCollection
s and Observation
s as nested objects?name
on a platform, and also a name
on a sensor attached to it); which would the name
filter apply to? My preference would be that the filter applies to the highest level name
onlyI think we might also have to consider, for the sake of search functionality...
name__contains
name__containsAny
name__containsAll
or
or and
filter, depending on whether __containsAny
or __containsAll
is usedcontains
variant only allows one word to be specified, or a phrase if surrounded by double quotescontainsAny
and containsAll
variant allows multiple words or phrases to be specified, joined by a plus symbolcontains
http://example.com/platforms?name__contains="Room 2.048"
http://example.com/platforms?name__contains=2.048
http://example.com/platforms?name__containsAll=Room+2.048
http://example.com/platforms?name__containsAny=2.060+2.048
http://example.com/platforms?name__containsAny="Room 2.060"+"Room 2.048"
Disclaimer: The above is similar to how it's already been implemented in some Newcastle APIs, happy to look at alternatives
Will case be considered? can we have icontains
Will case be considered? can we have
icontains
Personally, I don't think case sensitivity is necessary. Would be good to know if people feel strongly the other way.
Oh boy, this gets "fun" quickly!
My thoughts:
?value__gte=10.0
in your nested example is fine. If we start doing member.hasResult.value__gte=10.0
things are going to get pretty gnarly for the end user. I personally can't see myself allowing all that many filterable properties on a given endpoint so the risk of collisions is fairly low. I'd agree that the filter should apply to the highest level for Luke's name
example.dayOfWeek
and MO,WE,FR
and MO-FR
. Presumable these are case insensitive too, i.e. MO-FR
is the same as mo-fr
.containsAll
and containsAny
could you use a comma separated approach, e.g. name__containsAny=2.060,2.048
? Feels slightly odd seeing double quotes in a URL. Are these for looking for substrings in a longer string, or for searching for elements in a property that's an array, or both? Either way it's different to Aare's dayOfWeek=MO,TU
example which is more about filtering discrete values right?Personally I'm happy with all of the above. I think enough time has passed, and we should now consider writing this up into the standards doc, and schematising the query parameters etc.
Any objections?
None from me!
Sent from Yahoo Mail on Android
On Tue, 3 Dec 2019 at 18:22, Luke Smithnotifications@github.com wrote:
Personally I'm happy with all of the above. I think enough time has passed, and we should now consider writing this up into the standards doc, and schematising the query parameters etc.
Any objections?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Aye let's get it written up. Let me know if you want me to do any of it. Looks like we can click "edit" on any of the posts above, so should be relatively quick to copy the markdown over into the working document.
One more to add to this list before we get this written up, which follows on from my previous issue.
Can we have an exists condition?
E.g.
To get a list of all sensors not yet hosted on a platform:
GET /sensors?isHostedBy__exists=false
Or perhaps all the observations for which the featureOfInterest is defined:
GET /observations?hasFeatureOfInterest__exists=true
Just reads a bit nicer than our previous isDefined suggestion.
No inherent problem with __exists
, but we need to clarify how we would handle null values in that case. A null value would exist, but wouldn't be defined.
Good point. Guess this depends on whether we're showing null values to the user or not.
E.g.
{
"id": "sensor-123",
"observes": "air-temperature",
"inDeployment": null
}
vs.
{
"id": "sensor-123",
"observes": "air-temperature"
}
I was veering towards the latter, in which case any null values that may exist in the backend database don't "exist" to the end user and therefore __exists on its own would suffice.
However, if there's merit in showing null values then my preference would actually be to stick with just __isDefined and drop __exists.
Interested to hear peoples thoughts on this.
The only place I can see a clear rationale for having null
values is in the observation value itself, where for example we might have an 'alarm' timeseries, and null
means no alarm.
I admit I can't think of any other places where null
would be useful. We can certainly discourage the use of null
values in serialisations in favour of omission.
Maybe there isn't a problem in that case, and we should just allow __exists
and __isDefined
, but neither are mandatory and implementations could be free to implement none, one or both combinations in their filters.
Does that work?
Actually, I suppose it should be __exists
and __defined
for consistency?
Yep works for me, and yes __defined
is better than __isDefined
.
Even in the case of an alarm, wouldn't boolean values work without the need to introduce null values?
On Wed, 4 Dec 2019 at 11:52, Luke Smith notifications@github.com wrote:
Actually, I suppose it should be exists and defined for consistency?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/urbanobservatory/standards/issues/18?email_source=notifications&email_token=AB6X6YPKVJIFNDYTCMAHVLDQW6KWLA5CNFSM4JBXS5H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF4YBOI#issuecomment-561610937, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6X6YIWEG2UKHBNAKKJJNLQW6KWLANCNFSM4JBXS5HQ .
Perhaps, if the alarm were binary. But there might be an enumeration of alarms presented as an array for example:
[
"https://example.org/alarm/low-temperature",
"https://example.org/alarm/no-signal"
]
In the above case, either an empty array or null
would be appropriate for when there are no alarms.
I'm not suggesting any of this is the right way to do it, just trying to retain flexibility as much as possible. Open to being convinced otherwise :-)
Aye let's get it written up. Let me know if you want me to do any of it. Looks like we can click "edit" on any of the posts above, so should be relatively quick to copy the markdown over into the working document.
I've made some progress implementing this into a JS library, but not quite ready to share yet.
If you get chance to look at a transforming the above mess into a simple HTML table for the actual document, I'd really appreciate it.
Sure, I can wack these in a HTML table. Might have to be 2 tables then a few specific examples:
Table 1: Special keys
e.g.
key | description | example |
---|---|---|
limit | limit the number of records returns | ?limit=1 |
proximityradius | the distance from the proximitycentre in metres | ?proximityradius=1000 |
Table 2: Modifiers
e.g.
modifier | description | example |
---|---|---|
gt | greater than | ?datetime__gt=2019-01-01 |
contains | For wildcard matching. Only allows one word to be specified, or a phrase if surrounded by double quotes | ?name__contains=west |
Special Examples
More detailed description of how to query with spatial windows, time windows, pagination, by proximity, etc.
I'm in the process of putting all parameters in a table now, just wondering if the time based parameters, i.e. minuteofhour, hourofday, etc should actually be "modifiers".
e.g. change:
monthofyear=10
for:
resultTime__monthofyear=10
Reason:
resultTime__gte=2019
.The upshot is that the only special parameter keys we're left with are those for pagination, i.e. limit, offset, sortorder, sortby, and those for circular bounding area: proximityradius and proximityradius. This is no bad thing.
Any objections?
It makes sense.
Just one (likely silly) question. Are we considering to apply multiple filters to the same parameter on the same query? If we are, what would be the sintax, e.g. resultTimegte=2017&resultTimemonthofyear=10 ?
On Thu, 9 Jan 2020 at 17:44, Si Bell notifications@github.com wrote:
I'm in the process of putting all parameters in a table now, just wondering if the time based parameters, i.e. minuteofhour, hourofday, etc should actually be "modifiers".
e.g. change:
monthofyear=10
for:
resultTime__monthofyear=10
Reason:
- It's more consistent with how we apply other time-based filters, e.g. resultTime__gte=2019.
- If you have a resource, e.g. observations, that have more than one time-based properties, e.g. an observation might have a timestamp for the time of measurement (resultTime), and one for when it arrived at the server (arrivalTime), then this approach lets you choose which one to filter by.
The upshot is that the only special parameter keys we're left with are those for pagination, i.e. limit, offset, sortorder, sortby, and those for circular bounding area: proximityradius and proximityradius. This is no bad thing.
Any objections?
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/urbanobservatory/standards/issues/18?email_source=notifications&email_token=AB6X6YMWYMMKPUMGKY4QRXDQ45O6XA5CNFSM4JBXS5H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIRE3DI#issuecomment-572673421, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6X6YMICFBIYXIWJXXW3DLQ45O6XANCNFSM4JBXS5HQ .
No objections from me, this sounds like a really sensible idea, and would provide options for granular filtering on other date-time based data too in its generic form.
I think Ettore's suggestion is right, that would be a valid query for results in October 2017, 2018, 2019 etc., with query constraints being additive.
The other aspect to consider is multiple 'modifiers', which I suggest we allow but don't mandate as a minimum. I'm thinking for example resultTime__monthOfYear__gte=10
for October through December.
Perhaps based on the above we need to clarify the terminology slightly. resultTime
is a selector (picking a specific value within the JSON response), monthOfYear
is a sub-selector (picking a part of that value), and gte
is a modifier? That way the order would always be selector, sub-selector, modifier, and resultTime__gte__monthOfYear
would be invalid. Does that make sense?
It makes sense.
On Fri, 10 Jan 2020, 09:20 Luke Smith, notifications@github.com wrote:
No objections from me, this sounds like a really sensible idea, and would provide options for granular filtering on other date-time based data too in its generic form.
I think Ettore's suggestion is right, that would be a valid query for results in October 2017, 2018, 2019 etc., with query constraints being additive.
The other aspect to consider is multiple 'modifiers', which I suggest we allow but don't mandate as a minimum. I'm thinking for example resultTimemonthOfYeargte=10 for October through December.
Perhaps based on the above we need to clarify the terminology slightly. resultTime is a selector (picking a specific value within the JSON response), monthOfYear is a sub-selector (picking a part of that value), and gte is a modifier? That way the order would always be selector, sub-selector, modifier, and resultTimegtemonthOfYear would be invalid. Does that make sense?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/urbanobservatory/standards/issues/18?email_source=notifications&email_token=AB6X6YLI5SFUQVZX2UJHS7TQ5A4WPA5CNFSM4JBXS5H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEITHIEI#issuecomment-572945425, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6X6YI7GNKADOC3ZLPR373Q5A4WPANCNFSM4JBXS5HQ .
I agree that Ettore's suggestion is right.
Happy to allow sub-selectors too. I'll add it to the docs. Although I'm struggling to think of a use-case other than time-based selectors, but definitely worth having it as an option.
Query string parameters allow greater control over the resources returned when making a request.
They come in particularly useful when making GET requests.
For example the following request doesn't have any query string parameters:
GET https://api.urbanobservatory.com/observations
Whereas the following does:
GET https://api.urbanobservatory.com/observations?madeBySensor=thermometer-abc123
The latter lets us filter the observations returned to just those made by the sensor with id: thermometer-abc123.
This simple example has the form:
selector=value
i.e. madebysensor
is the selector, and thermometer-abc123
is the value.
We also accept more complex query string parameters of the form:
selector__modifier=value
This modifer allow you to perform more than just an equality filter.
The following example has no modifier:
GET .../observations?resultTime=2020-01-09T18:05:24.969Z
This would only get observations recorded at that exact millisecond, but what if you wanted all observations since the start of 2020, well that's where a modifier (in this case gte
) can help. Here's how the request would look:
GET .../observations?resultTime__gte=2020-01-01T00:00:00.000Z
The modifier always follows a double underscore: __
. It acts upon the selector listed before the __
.
We also have sub-selectors, that focus on a specific component of a selector. They're particularly useful for dealing with timestamps. The format is as follows:
selector__subselector=value
For example:
resultTime__monthOfYear=10
Where resultTime
is the selector, and monthOfYear
is the sub-selector. In this example it allows you to only retrieve observations with a resultTime within the month of October.
You can also use a subselector in combination with a modifier using the following format:
selector__subselector__modifier=value
e.g.
resultTime__monthOfYear__gte=10
This would retrieve any observations with a resultTime in October, November or December.
N.B. Not every observatory will support all of these formats, and each endpoint may only have a small number of query string parameters it accepts. However, when available, this is the format each observatory will abide by.
N.B. query string selectors, sub-selectors and modifiers and values are case-insensitive unless specifically defined otherwise.
Typically the selector is a property of the resource being returned, e.g. resultTime or madeBySensor. However, there are some special selectors that provided further functionality. They are listed in the following table.
key | description | examples |
---|---|---|
limit | Limits the number of records returns. Commonly used with the offset, sortorder and sortby parameters. MUST be an integer value ≥ 1. | limit=1 or limit=100&offset=200&sortorder=asc&sortby=resultTime . |
offset | Commonly used for pagination in combination with the limit parameter to skip the first n resources. MUST be an integer value ≥ 0. | limit=10&offset=30 |
sortorder | Used in combination sortby to sort the returned resources by the property provided. Use asc for ascending and desc for descending. |
sortorder=desc&sortby=resultTime |
sortby | Used in combination with sortorder to sort the returned resources by the property provided. | sortorder=asc&sortby=madeBySensor |
proximitycentre | MUST be used in combination with proximityradius. Sets the centre of a circular or spherical (if height is given) bounding area. I.e. only resources within the spatial area are returned. Uses the format: longitude,latitude, height. Height is optional. Longitude and latitude use WGS84. Height is in metres. | proximitycentre=-1.9,52.2&proximityradius=1000 or proximitycentre=-1.9,52.2,200&proximityRadius=100 |
proximityradius | MUST be used in combination with proximitycentre. Sets the distance from the proximitycentre in metres. | proximityradius=1000 |
N.B. sub-selectors that deal with times and dates assume that the timezone is UTC.
sub-selector | description | examples |
---|---|---|
minuteofhour | Filters by the minute of the hour. Integer values between 0 and 59. | resultTime__minuteOfHour=30 |
hourofday | Filters by the hour of day. An integer value between 0 and 23. I.e. a 24 hour clock. | resultTime__hourOfDay=22 |
dayofweek | Filters by the day of the week. Valid values are: mo, tu, we, th, fr, sa, su. For multiple days use a comma separate list, e.g. mo,we,fr , or a range mo-fr . |
resultTime__dayofweek=mo or resultTime__dayofweek=mo,we,fr or dayofweek=mo-fr |
dayofmonth | Filters by the day of the month. Integer values between 0 and 31 | resultTime__dayofmonth=2 |
dayofyear | Filters by the day of the year. | resultTime__dayofyear=301 |
monthofyear | Filters by the month of year. Integer values between 1 and 12 | resultTime__monthofyear=11 |
year | Filter resources to just a single year. | resultTime__year=2019 |
modifier | description | examples |
---|---|---|
none | Format: key=value . When no modifier is present, and assuming the parameter key isn't listed in the table above (e.g. it's not limit, offset, etc), then the key is a property that exists on the resources being requested. Only those resources that have a matching value for this property will be returned. |
inDeployment=weather-stations-in-schools or isHostedBy=lamppost-32 |
gt | greater than | resultTime__gt=2019-01-01 |
gte | greater than or equal to | height__gte=10 |
lt | less than | latitude__lt=60 |
lte | less than or equal to | value__lte=20 |
contains | For wildcard matching. Only allows one word to be specified, or a phrase if surrounded by double quotes | name__contains=west or name__contains="Room 2.048" |
containsany | allows multiple words or phrases to be specified, joined by a + symbol. |
name__containsAny=2.060+2.048 or name__containsAny="Room 2.060"+"Room 2.048" |
containsall | allows multiple words or phrases to be specified, joined by a + symbol. |
name__containsAll=Room+2.048 |
exists | Used to check if a resource property exists or not. | isHostedBy__exists=false |
defined | Used to check if a resource property has been defined or not. In most cases it will behave the same as as exists, the only time is may differ is if resources can have properties will of value of null, in which case that property would exist, but would not be defined. | value__defined=false |
The following gets observations with a resultTime between two dates.
GET .../observations?resultTime__gte=2020-01-01T00:00:00.000Z&resultTime__gte=2020-01-01T12:00:00.000Z
The following only gets observations in the year 2020 on weekdays.
GET .../observations?resultTime__dayofweek=mo-fr&resultTime__year=2020
The following retrieves any observations within a bounding box (in this case around Birmingham city centre).
GET .../observations?latitude__lte=52.495768&latitude__gte=52.464492&longitude__lte=-1.875352&longitude__gte=-1.928481
And now for only observations above 1 m.
GET .../observations?latitude__lte=52.495768&latitude__gte=52.464492&longitude__lte=-1.875352&longitude__gte=-1.928481&height__gt=1
The following retrieves any observations within 1000 m from the centre of Birmingham:
GET .../observations?proximitycentre=1.895007,52.477096&proximityradius=1000
Let's say you want all air-temperature observations from a platform called mobile-sensing-van in the year 2019. There's potentially thousands of observations available and therefore we want to get them in chunks. Our initial request looks like this.
GET .../observations?observedProperty=air-temperature&platform=mobile-sensing-van&resultTime__year=2019&limit=100&offset=0&sortby=resultTime&sortorder=asc
This returns exactly 100 observations, and therefore there's still more observations to retrieve, so we adjust the offset and make the following request:
GET .../observations?observedProperty=air-temperature&platform=mobile-sensing-van&resultTime__year=2019&limit=100&offset=100&sortby=resultTime&sortorder=asc
In this scenario we'd keep incrementing the offset by 100 until we no longer received 100 observations back.
The List of endpoints suggested by Simon in his examples are:
<base_url>/deployments/<deployment_name>
where the response contains the deployment specified by
<base_url>/deployments
where the response contains a list of all the deployments
<base_url>/deployments/<deployment_name>/platforms
where the response contains the list of all the platforms in the specified deployment
<base_url>/observations?startDate=<start_date>&endDate=<end_date>
where the response contains a list of ALL the observations recorded between
1. Would it make sense to query for all the platform belonging to any deployment, hence having an endpoint such as
<base_url>/platforms
?
Or is it preferable to always specify the deployment as in:
<base_url>/deployments/<deployment_name>/platforms/<platform_name>
?
(either one or both endpoints would have to be added to the list proposed by Simon).
2. If someone wants to retrieve a single platform, should he/she use
<base_url>/platforms/<platform_name>
or
<base_url>/deployments/<deployment_name>/platforms/<platform_name>
?
(the second endpoint being possibly redundant if the platform name is kept unique across all the deployments)
3. Does it make sense to ask for ALL observations within a specified time window regardless of the sensor that made it, or at least the ObservedProperty it refers to, as Simon suggested in his example?
Should we consider instead a more specific query when it comes to retrieving observations, like:
<base_url>/sensors/<sensor_name>/observations?startDate=<start_date>&endDate=<end_date>
and / or
<base_url>/observedproperty/<property_name>/observations?startDate=<start_date>&endDate=<end_date>
?
4. How would the endpoint for querying a given sensor look like?
<base_url>/sensors/<sensor_name>
(assuming unique sensor names across all platforms / systems / deployment)
or
<base_url>/platform/<platform_name>/sensors/<sensor_name>
(assuming unique sensor names only within a platform)
or
<base_url>/deployments/<deployment_name>/sensors/<sensor_name>
(assuming unique sensor names within an entire deployment)
or else...?
My preference would be that all your suggestions are valid, because some endpoint structures will be better suited to particular clients/frontends that others.
For example, I will be handling much of my authorisation at the deployment level, e.g. only certain users will have admin rights to a particular private deployment. Therefore, when I create a front-end that allows admin users to, for example, remove a sensor from a deployment I will want the deployment ID in the URL. For any URL starting <base_url>/deployments/<deployment_id>/
my API server will verify that the user actually has access rights to this deployment.
Alternatively, the web app that we'll build for the general public to use will be better off using endpoints such as <base_url>/observations
or <base_url>/platforms
. I just make sure that the observations or platforms returned haven't come from private deployments.
I think we need to pick a small handful of endpoints that we MUST support. The obvious one being <base_url>/observations
. Then if we want to support more then we can do so, trying our best to be consistent so that we don't end up with one observatory using /deployments/
and another /deployment/
.
Thank you, Simon. All that you say makes sense to me. I only have some reservation on the
This is where pagination should come to the rescue. So that even if they make a request that would match millions of observations, we only return a maximum of 1000 (for example). Most databases should have some limit, offset and sort functionality to help with this.
What we haven't decided on yet is how we tell the user they have hit our maximum limit and presumably provide a URL for them to get the next 1000.
It would be very easy for me to support endpoints such
It's worth saying that a single sensor could in theory upload thousands of observations everyday by itself, e.g. if it sampled every second, therefore we'd almost certainly need pagination on these additional endpoints too.
I worry we're going down the wrong path with a list of endpoints. A REST API shouldn't have a list of endpoints, because it's driven by hypermedia, meaning it doesn't matter what the web addresses are because you follow the links to get there.
There's absolutely nothing wrong with the endpoints you've suggested, it looks sensible as a way of implementation. But if I wanted to file my Platform
s under https://api.example.com/silly-sausages
then I should be able to.
We do need is some agreement on how we manage Collection
s (if we call them that... this might for example be a collection of platforms, thus paginated, as Si refers to) that aren't ObservationCollection
s. Examples of how we do that would be either hydra:Collection
or rdf:Bag
.
It's also entirely possible that you might have all your platforms in one API (a lamp post API, say) and all your sensors in another (an air quality API, say) and all your historic observations in another (an observation collection API, say) and they would all just link to each other.
We also need an entrypoint that directs clients to these collections as a starting point. In other words, when I hit https://api.example.com
it gives me links to a collection of sensors, a collection of platforms, a collection of observations, etc. It wouldn't need to give me all of those necessarily, you might not have a collection of all observations from all sensors (which could be huge, but might be useful), you might only have collections of observations under each sensor.
In theory, this would/could look something like...
GET https://api.example.com/
{
"@context": {
"@base": "https://api.example.com/",
"uo": "https://urbanobservatory.github.io/standards/vocabulary/latest/",
"title": "http://purl.org/dc/terms/title",
"collections": {
"@id": "uo:EntrypointCollections",
"@container": "@id"
}
},
"collections": {
"/sensors": {
"@type": ["@id", "uo:Collection", "uo:SensorCollection"],
"title": "All sensors available in Newcastle upon Tyne"
}
}
}
Is this discussion best split into a new issue? Not sure we're talking about filtering anymore...
Thank you, Luke.
I agree we need an entry point and then just follow the links to get the resources we want. I guess I was assuming that the structure of the tree that stems from the entry point would be the same for all observatories. This is what I meant by "and agreed list of endpoints".
On Sun, 26 Jan 2020, 12:46 Luke Smith, notifications@github.com wrote:
I worry we're going down the wrong path with a list of endpoints. A REST API shouldn't have a list of endpoints, because it's driven by hypermedia, meaning it doesn't matter what the web addresses are because you follow the links to get there.
There's absolutely nothing wrong with the endpoints you've suggested, it looks sensible as a way of implementation. But if I wanted to file my Platforms under https://api.example.com/silly-sausages then I should be able to.
We do need is some agreement on how we manage Collections (if we call them that... this might for example be a collection of platforms, thus paginated, as Si refers to) that aren't ObservationCollections. Examples of how we do that would be either hydra:Collection https://www.hydra-cg.com/spec/latest/core/#collections or rdf:Bag https://www.w3.org/TR/rdf-schema/#ch_bag.
It's also entirely possible that you might have all your platforms in one API (a lamp post API, say) and all your sensors in another (an air quality API, say) and all your historic observations in another (an observation collection API, say) and they would all just link to each other.
We also need an entrypoint that directs clients to these collections as a starting point. In other words, when I hit https://api.example.com it gives me links to a collection of sensors, a collection of platforms, a collection of observations, etc. It wouldn't need to give me all of those necessarily, you might not have a collection of all observations from all sensors (which could be huge, but might be useful), you might only have collections of observations under each sensor.
In theory, this would/could look something like...
{ "@context": { "@base": "https://api.example.com/", "uo": "https://urbanobservatory.github.io/standards/vocabulary/latest/", "title": "http://purl.org/dc/terms/title", "collections": { "@id": "uo:EntrypointCollections", "@container": "@id" } }, "collections": { "/sensors": { "@type": ["@id", "uo:Collection", "uo:SensorCollection"], "title": "All sensors available in Newcastle upon Tyne" } } }
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/urbanobservatory/standards/issues/18?email_source=notifications&email_token=AB6X6YJJ5VU5AGZUE3SQY63Q7WA4BA5CNFSM4JBXS5H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ5TBDI#issuecomment-578498701, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6X6YIGRUFXN5M5C3QLY6LQ7WA4BANCNFSM4JBXS5HQ .
@lukessmith I've created a new issue on Collections and Pagination, as I agree it makes sense to start a new thread for this.
Being able to reach all the endpoints by following links makes perfect sense, but surely there's benefit to keeping some consistency between observatories? E.g. so that any scripts or front-ends that use an observatory's API would work just as well with other observatories' without having to change much more than the base url.
At the risk of getting carried away, I have another two modifiers that would be useful:
An __includes
modifier would come in handy for selecting resources for which the provided item occurs with an array property.
For example an observation might have a flag property {flag: ['persistence', 'upperbound']}
.
Then to query for all observations that have been flagged as breaching a climatic upper bound you can use:
/observations?flag__includes=upperbound
I've found myself using a query parameter called search
. E.g.
/platforms?search=lamppost
It behaves a little bit like the __contains
except it searches across more than one field. In my case it will typically search both the id and the name for any keyword matches. Mentioning it in case it's something others see themselves using and therefore worthy of adding to the docs.
Another addition, as discussed on the technical call today: not
. For when we want to exclude something, or perform the opposite of a filter.
For example:
/observations?unit__not=uo:kelvin
Will exclude observations given in the unit Kelvin.
Another example:
/observations?resultTime__not__gte=2020-01-01
This would be the opposite of resultTime__gte
. Although this is a bad example as we could just use resultTime__lt
.
We'd also want to be able to provide a comma-separated list e.g:
/observations?unit__not=uo:kelvin,uo:fahrenheit
Although thinking about it, the right way to do this might be in combination with the __in
modifier mentioned above, i.e.
/observations?unit__not__in=uo:kelvin,uo:fahrenheit
Because the __in
modifier basically implies that the query parameter value will be an array.
Key parameters are as follows:
Two more we didn't discuss in the meeting, but might be worthy of adding:
Please provide suggestions for how to do each and we'll pick a favourite for each.