urbanobservatory / standards

Standards and schema documentation for the observatories programme
2 stars 0 forks source link

Setting standards for basic querying/filtering #18

Open SiBell opened 4 years ago

SiBell commented 4 years ago

20191017_112039

Key parameters are as follows:

Two more we didn't discuss in the meeting, but might be worthy of adding:

Please provide suggestions for how to do each and we'll pick a favourite for each.

SiBell commented 4 years ago

Ok here's my stab at this. I've done everything but Pagination and Response Format. Let me know what you think.

General rules

Modifiers include:

Time window

Used to filter the data temporally.

Keys

Usage

The dateTime must be in ISO8601 format.

Defaults to UTC unless specified otherwise.

None, 1 or 2 of the keys can be provided.

Required validation

Returns error response if:

Examples

?dateTime__gte:2019-10-18

?dateTime__gt:2019-10-18T15:03:34.614Z

?dateTime__gt:2019-10-18T15:03:34.614+04

?dateTime__gte:2019-10-18&dateTime__lt:2019-10-24

Filter by component of date/time

Keys

Usage

Assumes UTC is being used.

hourOfDay uses 24H clock

Several of these can be used together

Required validation

Check values fall within expected range, e.g.

Examples

?minuteOfHour=30

?hourOfDay=22

?dayOfWeek=2 (i.e. for Tuesday)

?dayOfMonth=2

?dayOfYear=301

?monthOfYear=11

?year=2019

Spatial window

Keys

Usage

Latitudes and longitudes are given in WGS84 datum.

Height is in meters above or below the WGS 84 reference ellipsoid (same as GSOJSON).

None, 1 or 2 height keys can be used in a request.

None, 1 or 2 latitude keys can be used in a request.

None or 2 longitude keys can be used in a request.

Required validation

Examples

?latitude__gt=52

?longitude__gt=-8.5&longitude__lte=2

?height___lt=10

Point and radius

Keys

Usage

Allows the user to find all resources (e.g. Platforms, observations, etc) within a given distance of a point.

The proximityCentre is the centre given in the form longitude,latitude, height. The height is optional. When height is given the filtered region turns from a circle into a sphere.

Longitude and latitude are in WGS84. Height is in metres.

proximityRadius is the distance from the proximityCentre in metres.

Required validation

Return error response if:

Examples

?proximityCentre=-1.9,52.2&proximityRadius=1000

?proximityCentre=-1.9,52.2,10&proximityRadius=1000

Equality filter

Keys

Depends on the resource.

Usage

Certain resources will be filterable by certain attributes.

Required validation

Examples

If, for example, you needed to find people whose hair colour is brown then your request might look like this:

https://api.example.com/people?hairColour=brown

More examples:

?inDeployment=weather-stations

?isHostedBy=lamppost-101

?age=18

Thresholds

Keys

Depends on the resource.

Usage

Applies modifiers to keys that are specific to the resource being queried.

Required validation

Examples

/people?age__lt=18

/observations?value__gte=30.5&observedProperty=air-temperature

Limit

Keys

Usage

For endpoints returning a collection of resources this parameter will limit the number of resources returned.

Can be used in combination with the sortBy and sortOrder keys to get just the "last n" or "first n" resources in the collection.

Required validation

Examples

?limit=100

?limit=1&sortOrder=asc&sortBy=age

Sort

Keys

Usage

For endpoints returning a collection of resources this parameter will sort the resources returned.

Sorts both numerical fields and also strings (i.e. alphabetically).

Can be used in combination with the limit key to get just the "last n" or "first n" resources in the collection.

sortOrder defaults to asc if the sortBy key is provided without sortOrder.

Required validation

Examples

?sort=desc

?limit=1&sortOrder=asc&sortBy=age

aarepuu commented 4 years ago

Good work Simon. I agree most of the things. There are couple of things I would add/specify.

Filter by component of date/time

Additionally ranges and comma separated list should also apply for other filters that are numeric.

aarepuu commented 4 years ago

Here's what I think for pagination.

Pagination

Keys

Usage

Used for pagination of results. Both are represented as integers and are not required parameters. If not specified it defaults to limit=10 and offset=0.

Can be used in combination with the sortBy and sortOrder keys.

Required validation

Examples

?limit=100 ?limit=10&offset=10 ?limit=10&offset=10&sortOrder=asc&sortBy=age

Not completely sure if we should make limit compulsory when using offset or just use default limit=10 when not specified?

lukeshope commented 4 years ago

I agree with all of this, and also with Aare's comments on days of the week and ranges. Thanks both.

In the interests of making this more widely applicable, I think it would be worth nailing down what the general case is. Specifically:

As an example:

{
  "@type": "Sensor",
  [...],
  "madeObservation": {
    "@type": "ObservationCollection",
    "member": [{
      "@type": "Observation",
      "hasResult": {
        "@type": "Temperature",
        "value": 12.0
      },
      "resultTime": "2019-10-25T15:14:00Z"
    }]
  }
}

In the above example:

lukeshope commented 4 years ago

I think we might also have to consider, for the sake of search functionality...

Wildcard matching

Keys

Usage

Validation

Examples

Disclaimer: The above is similar to how it's already been implemented in some Newcastle APIs, happy to look at alternatives

nharris172 commented 4 years ago

Will case be considered? can we have icontains

lukeshope commented 4 years ago

Will case be considered? can we have icontains

Personally, I don't think case sensitivity is necessary. Would be good to know if people feel strongly the other way.

SiBell commented 4 years ago

Oh boy, this gets "fun" quickly!

My thoughts:

lukeshope commented 4 years ago

Personally I'm happy with all of the above. I think enough time has passed, and we should now consider writing this up into the standards doc, and schematising the query parameters etc.

Any objections?

geoanorak commented 4 years ago

None from me! 

Sent from Yahoo Mail on Android

On Tue, 3 Dec 2019 at 18:22, Luke Smithnotifications@github.com wrote:
Personally I'm happy with all of the above. I think enough time has passed, and we should now consider writing this up into the standards doc, and schematising the query parameters etc.

Any objections?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

SiBell commented 4 years ago

Aye let's get it written up. Let me know if you want me to do any of it. Looks like we can click "edit" on any of the posts above, so should be relatively quick to copy the markdown over into the working document.

SiBell commented 4 years ago

One more to add to this list before we get this written up, which follows on from my previous issue.

Can we have an exists condition?

E.g.

To get a list of all sensors not yet hosted on a platform:

GET /sensors?isHostedBy__exists=false

Or perhaps all the observations for which the featureOfInterest is defined:

GET /observations?hasFeatureOfInterest__exists=true

Just reads a bit nicer than our previous isDefined suggestion.

lukeshope commented 4 years ago

No inherent problem with __exists, but we need to clarify how we would handle null values in that case. A null value would exist, but wouldn't be defined.

SiBell commented 4 years ago

Good point. Guess this depends on whether we're showing null values to the user or not.

E.g.

{
  "id": "sensor-123",
  "observes": "air-temperature",
  "inDeployment": null
}

vs.

{
  "id": "sensor-123",
  "observes": "air-temperature"
}

I was veering towards the latter, in which case any null values that may exist in the backend database don't "exist" to the end user and therefore __exists on its own would suffice.

However, if there's merit in showing null values then my preference would actually be to stick with just __isDefined and drop __exists.

Interested to hear peoples thoughts on this.

lukeshope commented 4 years ago

The only place I can see a clear rationale for having null values is in the observation value itself, where for example we might have an 'alarm' timeseries, and null means no alarm.

I admit I can't think of any other places where null would be useful. We can certainly discourage the use of null values in serialisations in favour of omission.

Maybe there isn't a problem in that case, and we should just allow __exists and __isDefined, but neither are mandatory and implementations could be free to implement none, one or both combinations in their filters.

Does that work?

lukeshope commented 4 years ago

Actually, I suppose it should be __exists and __defined for consistency?

SiBell commented 4 years ago

Yep works for me, and yes __defined is better than __isDefined.

EttoreHector commented 4 years ago

Even in the case of an alarm, wouldn't boolean values work without the need to introduce null values?

On Wed, 4 Dec 2019 at 11:52, Luke Smith notifications@github.com wrote:

Actually, I suppose it should be exists and defined for consistency?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/urbanobservatory/standards/issues/18?email_source=notifications&email_token=AB6X6YPKVJIFNDYTCMAHVLDQW6KWLA5CNFSM4JBXS5H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF4YBOI#issuecomment-561610937, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6X6YIWEG2UKHBNAKKJJNLQW6KWLANCNFSM4JBXS5HQ .

lukeshope commented 4 years ago

Perhaps, if the alarm were binary. But there might be an enumeration of alarms presented as an array for example:

[
  "https://example.org/alarm/low-temperature",
  "https://example.org/alarm/no-signal"
]

In the above case, either an empty array or null would be appropriate for when there are no alarms.

I'm not suggesting any of this is the right way to do it, just trying to retain flexibility as much as possible. Open to being convinced otherwise :-)

lukeshope commented 4 years ago

Aye let's get it written up. Let me know if you want me to do any of it. Looks like we can click "edit" on any of the posts above, so should be relatively quick to copy the markdown over into the working document.

I've made some progress implementing this into a JS library, but not quite ready to share yet.

If you get chance to look at a transforming the above mess into a simple HTML table for the actual document, I'd really appreciate it.

SiBell commented 4 years ago

Sure, I can wack these in a HTML table. Might have to be 2 tables then a few specific examples:

Table 1: Special keys

e.g.

key description example
limit limit the number of records returns ?limit=1
proximityradius the distance from the proximitycentre in metres ?proximityradius=1000

Table 2: Modifiers

e.g.

modifier description example
gt greater than ?datetime__gt=2019-01-01
contains For wildcard matching. Only allows one word to be specified, or a phrase if surrounded by double quotes ?name__contains=west

Special Examples

More detailed description of how to query with spatial windows, time windows, pagination, by proximity, etc.

SiBell commented 4 years ago

I'm in the process of putting all parameters in a table now, just wondering if the time based parameters, i.e. minuteofhour, hourofday, etc should actually be "modifiers".

e.g. change:

monthofyear=10

for:

resultTime__monthofyear=10

Reason:

  1. It's more consistent with how we apply other time-based filters, e.g. resultTime__gte=2019.
  2. If you have a resource, e.g. observations, that have more than one time-based properties, e.g. an observation might have a timestamp for the time of measurement (resultTime), and one for when it arrived at the server (arrivalTime), then this approach lets you choose which one to filter by.

The upshot is that the only special parameter keys we're left with are those for pagination, i.e. limit, offset, sortorder, sortby, and those for circular bounding area: proximityradius and proximityradius. This is no bad thing.

Any objections?

EttoreHector commented 4 years ago

It makes sense.

Just one (likely silly) question. Are we considering to apply multiple filters to the same parameter on the same query? If we are, what would be the sintax, e.g. resultTimegte=2017&resultTimemonthofyear=10 ?

On Thu, 9 Jan 2020 at 17:44, Si Bell notifications@github.com wrote:

I'm in the process of putting all parameters in a table now, just wondering if the time based parameters, i.e. minuteofhour, hourofday, etc should actually be "modifiers".

e.g. change:

monthofyear=10

for:

resultTime__monthofyear=10

Reason:

  1. It's more consistent with how we apply other time-based filters, e.g. resultTime__gte=2019.
  2. If you have a resource, e.g. observations, that have more than one time-based properties, e.g. an observation might have a timestamp for the time of measurement (resultTime), and one for when it arrived at the server (arrivalTime), then this approach lets you choose which one to filter by.

The upshot is that the only special parameter keys we're left with are those for pagination, i.e. limit, offset, sortorder, sortby, and those for circular bounding area: proximityradius and proximityradius. This is no bad thing.

Any objections?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/urbanobservatory/standards/issues/18?email_source=notifications&email_token=AB6X6YMWYMMKPUMGKY4QRXDQ45O6XA5CNFSM4JBXS5H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIRE3DI#issuecomment-572673421, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6X6YMICFBIYXIWJXXW3DLQ45O6XANCNFSM4JBXS5HQ .

lukeshope commented 4 years ago

No objections from me, this sounds like a really sensible idea, and would provide options for granular filtering on other date-time based data too in its generic form.

I think Ettore's suggestion is right, that would be a valid query for results in October 2017, 2018, 2019 etc., with query constraints being additive.

The other aspect to consider is multiple 'modifiers', which I suggest we allow but don't mandate as a minimum. I'm thinking for example resultTime__monthOfYear__gte=10 for October through December.

Perhaps based on the above we need to clarify the terminology slightly. resultTime is a selector (picking a specific value within the JSON response), monthOfYear is a sub-selector (picking a part of that value), and gte is a modifier? That way the order would always be selector, sub-selector, modifier, and resultTime__gte__monthOfYear would be invalid. Does that make sense?

EttoreHector commented 4 years ago

It makes sense.

On Fri, 10 Jan 2020, 09:20 Luke Smith, notifications@github.com wrote:

No objections from me, this sounds like a really sensible idea, and would provide options for granular filtering on other date-time based data too in its generic form.

I think Ettore's suggestion is right, that would be a valid query for results in October 2017, 2018, 2019 etc., with query constraints being additive.

The other aspect to consider is multiple 'modifiers', which I suggest we allow but don't mandate as a minimum. I'm thinking for example resultTimemonthOfYeargte=10 for October through December.

Perhaps based on the above we need to clarify the terminology slightly. resultTime is a selector (picking a specific value within the JSON response), monthOfYear is a sub-selector (picking a part of that value), and gte is a modifier? That way the order would always be selector, sub-selector, modifier, and resultTimegtemonthOfYear would be invalid. Does that make sense?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/urbanobservatory/standards/issues/18?email_source=notifications&email_token=AB6X6YLI5SFUQVZX2UJHS7TQ5A4WPA5CNFSM4JBXS5H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEITHIEI#issuecomment-572945425, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6X6YI7GNKADOC3ZLPR373Q5A4WPANCNFSM4JBXS5HQ .

SiBell commented 4 years ago

I agree that Ettore's suggestion is right.

Happy to allow sub-selectors too. I'll add it to the docs. Although I'm struggling to think of a use-case other than time-based selectors, but definitely worth having it as an option.

SiBell commented 4 years ago

Query string Parameters

Query string parameters allow greater control over the resources returned when making a request.

They come in particularly useful when making GET requests.

For example the following request doesn't have any query string parameters:

GET https://api.urbanobservatory.com/observations

Whereas the following does:

GET https://api.urbanobservatory.com/observations?madeBySensor=thermometer-abc123

The latter lets us filter the observations returned to just those made by the sensor with id: thermometer-abc123.

This simple example has the form:

selector=value

i.e. madebysensor is the selector, and thermometer-abc123 is the value.

We also accept more complex query string parameters of the form:

selector__modifier=value

This modifer allow you to perform more than just an equality filter.

The following example has no modifier:

GET .../observations?resultTime=2020-01-09T18:05:24.969Z

This would only get observations recorded at that exact millisecond, but what if you wanted all observations since the start of 2020, well that's where a modifier (in this case gte) can help. Here's how the request would look:

GET .../observations?resultTime__gte=2020-01-01T00:00:00.000Z

The modifier always follows a double underscore: __. It acts upon the selector listed before the __.

We also have sub-selectors, that focus on a specific component of a selector. They're particularly useful for dealing with timestamps. The format is as follows:

selector__subselector=value

For example:

resultTime__monthOfYear=10

Where resultTime is the selector, and monthOfYear is the sub-selector. In this example it allows you to only retrieve observations with a resultTime within the month of October.

You can also use a subselector in combination with a modifier using the following format:

selector__subselector__modifier=value

e.g.

resultTime__monthOfYear__gte=10

This would retrieve any observations with a resultTime in October, November or December.

N.B. Not every observatory will support all of these formats, and each endpoint may only have a small number of query string parameters it accepts. However, when available, this is the format each observatory will abide by.

N.B. query string selectors, sub-selectors and modifiers and values are case-insensitive unless specifically defined otherwise.

Special selectors

Typically the selector is a property of the resource being returned, e.g. resultTime or madeBySensor. However, there are some special selectors that provided further functionality. They are listed in the following table.

key description examples
limit Limits the number of records returns. Commonly used with the offset, sortorder and sortby parameters. MUST be an integer value ≥ 1. limit=1 or limit=100&offset=200&sortorder=asc&sortby=resultTime.
offset Commonly used for pagination in combination with the limit parameter to skip the first n resources. MUST be an integer value ≥ 0. limit=10&offset=30
sortorder Used in combination sortby to sort the returned resources by the property provided. Use asc for ascending and desc for descending. sortorder=desc&sortby=resultTime
sortby Used in combination with sortorder to sort the returned resources by the property provided. sortorder=asc&sortby=madeBySensor
proximitycentre MUST be used in combination with proximityradius. Sets the centre of a circular or spherical (if height is given) bounding area. I.e. only resources within the spatial area are returned. Uses the format: longitude,latitude, height. Height is optional. Longitude and latitude use WGS84. Height is in metres. proximitycentre=-1.9,52.2&proximityradius=1000 or proximitycentre=-1.9,52.2,200&proximityRadius=100
proximityradius MUST be used in combination with proximitycentre. Sets the distance from the proximitycentre in metres. proximityradius=1000

Sub-selectors

N.B. sub-selectors that deal with times and dates assume that the timezone is UTC.

sub-selector description examples
minuteofhour Filters by the minute of the hour. Integer values between 0 and 59. resultTime__minuteOfHour=30
hourofday Filters by the hour of day. An integer value between 0 and 23. I.e. a 24 hour clock. resultTime__hourOfDay=22
dayofweek Filters by the day of the week. Valid values are: mo, tu, we, th, fr, sa, su. For multiple days use a comma separate list, e.g. mo,we,fr, or a range mo-fr. resultTime__dayofweek=mo or resultTime__dayofweek=mo,we,fr or dayofweek=mo-fr
dayofmonth Filters by the day of the month. Integer values between 0 and 31 resultTime__dayofmonth=2
dayofyear Filters by the day of the year. resultTime__dayofyear=301
monthofyear Filters by the month of year. Integer values between 1 and 12 resultTime__monthofyear=11
year Filter resources to just a single year. resultTime__year=2019

Modifiers

modifier description examples
none Format: key=value. When no modifier is present, and assuming the parameter key isn't listed in the table above (e.g. it's not limit, offset, etc), then the key is a property that exists on the resources being requested. Only those resources that have a matching value for this property will be returned. inDeployment=weather-stations-in-schools or isHostedBy=lamppost-32
gt greater than resultTime__gt=2019-01-01
gte greater than or equal to height__gte=10
lt less than latitude__lt=60
lte less than or equal to value__lte=20
contains For wildcard matching. Only allows one word to be specified, or a phrase if surrounded by double quotes name__contains=west or name__contains="Room 2.048"
containsany allows multiple words or phrases to be specified, joined by a + symbol. name__containsAny=2.060+2.048 or name__containsAny="Room 2.060"+"Room 2.048"
containsall allows multiple words or phrases to be specified, joined by a + symbol. name__containsAll=Room+2.048
exists Used to check if a resource property exists or not. isHostedBy__exists=false
defined Used to check if a resource property has been defined or not. In most cases it will behave the same as as exists, the only time is may differ is if resources can have properties will of value of null, in which case that property would exist, but would not be defined. value__defined=false

Specific Examples

Time window

The following gets observations with a resultTime between two dates.

GET .../observations?resultTime__gte=2020-01-01T00:00:00.000Z&resultTime__gte=2020-01-01T12:00:00.000Z

The following only gets observations in the year 2020 on weekdays.

GET .../observations?resultTime__dayofweek=mo-fr&resultTime__year=2020

Spatial Bounding box

The following retrieves any observations within a bounding box (in this case around Birmingham city centre).

GET .../observations?latitude__lte=52.495768&latitude__gte=52.464492&longitude__lte=-1.875352&longitude__gte=-1.928481

And now for only observations above 1 m.

GET .../observations?latitude__lte=52.495768&latitude__gte=52.464492&longitude__lte=-1.875352&longitude__gte=-1.928481&height__gt=1

Proximity

The following retrieves any observations within 1000 m from the centre of Birmingham:

GET .../observations?proximitycentre=1.895007,52.477096&proximityradius=1000

Pagination

Let's say you want all air-temperature observations from a platform called mobile-sensing-van in the year 2019. There's potentially thousands of observations available and therefore we want to get them in chunks. Our initial request looks like this.

GET .../observations?observedProperty=air-temperature&platform=mobile-sensing-van&resultTime__year=2019&limit=100&offset=0&sortby=resultTime&sortorder=asc

This returns exactly 100 observations, and therefore there's still more observations to retrieve, so we adjust the offset and make the following request:

GET .../observations?observedProperty=air-temperature&platform=mobile-sensing-van&resultTime__year=2019&limit=100&offset=100&sortby=resultTime&sortorder=asc

In this scenario we'd keep incrementing the offset by 100 until we no longer received 100 observations back.

EttoreHector commented 4 years ago

List of Endpoints

The List of endpoints suggested by Simon in his examples are:

Deployments

<base_url>/deployments/<deployment_name> where the response contains the deployment specified by

<base_url>/deployments where the response contains a list of all the deployments

Platforms

<base_url>/deployments/<deployment_name>/platforms where the response contains the list of all the platforms in the specified deployment

Observations

<base_url>/observations?startDate=<start_date>&endDate=<end_date> where the response contains a list of ALL the observations recorded between and

Comments and proposals

1. Would it make sense to query for all the platform belonging to any deployment, hence having an endpoint such as

<base_url>/platforms ?

Or is it preferable to always specify the deployment as in:

<base_url>/deployments/<deployment_name>/platforms/<platform_name> ?

(either one or both endpoints would have to be added to the list proposed by Simon).

2. If someone wants to retrieve a single platform, should he/she use

<base_url>/platforms/<platform_name>

or

<base_url>/deployments/<deployment_name>/platforms/<platform_name> ?

(the second endpoint being possibly redundant if the platform name is kept unique across all the deployments)

3. Does it make sense to ask for ALL observations within a specified time window regardless of the sensor that made it, or at least the ObservedProperty it refers to, as Simon suggested in his example?

Should we consider instead a more specific query when it comes to retrieving observations, like:

<base_url>/sensors/<sensor_name>/observations?startDate=<start_date>&endDate=<end_date>

and / or

<base_url>/observedproperty/<property_name>/observations?startDate=<start_date>&endDate=<end_date> ?

4. How would the endpoint for querying a given sensor look like?

<base_url>/sensors/<sensor_name> (assuming unique sensor names across all platforms / systems / deployment)

or

<base_url>/platform/<platform_name>/sensors/<sensor_name> (assuming unique sensor names only within a platform)

or

<base_url>/deployments/<deployment_name>/sensors/<sensor_name> (assuming unique sensor names within an entire deployment)

or else...?

SiBell commented 4 years ago

My preference would be that all your suggestions are valid, because some endpoint structures will be better suited to particular clients/frontends that others.

For example, I will be handling much of my authorisation at the deployment level, e.g. only certain users will have admin rights to a particular private deployment. Therefore, when I create a front-end that allows admin users to, for example, remove a sensor from a deployment I will want the deployment ID in the URL. For any URL starting <base_url>/deployments/<deployment_id>/ my API server will verify that the user actually has access rights to this deployment.

Alternatively, the web app that we'll build for the general public to use will be better off using endpoints such as <base_url>/observations or <base_url>/platforms. I just make sure that the observations or platforms returned haven't come from private deployments.

I think we need to pick a small handful of endpoints that we MUST support. The obvious one being <base_url>/observations. Then if we want to support more then we can do so, trying our best to be consistent so that we don't end up with one observatory using /deployments/ and another /deployment/.

EttoreHector commented 4 years ago

Thank you, Simon. All that you say makes sense to me. I only have some reservation on the

/observations endpoint. If we end up having a lot of sensors (which is something we surely aim to), such an endpoint could potentially be loaded with huge data retrieval (if some of the sensor have a very high frequency reading and the time windows is - even mistakenly - too wide). I see why, for consistency, we may want to include /observations among the list of endpoints agreed upon. However I think it would be better to agree on something that is a little bit more limiting when it comes to observations, like /sensors//observation?startDate=& endDate= or /observedProperty//observations?startDate=& endDate= What you think? On Fri, 24 Jan 2020 at 12:49, Si Bell wrote: > My preference would be that all your suggestions are valid, because some > endpoint structures will be better suited to particular clients/frontends > that others. > > For example, I will be handling much of my authentication at the > deployment level, e.g. only certain users will have admin rights to a > particular private deployment. Therefore, when I create a front-end that > allows admin users to, for example, remove a sensor from a deployment I > will want the deployment ID in the URL. For any URL starting > /deployments// my API server will verify that > the user actually has access rights to this deployment. > > Alternatively, the web app that we'll build for the general public to use > will be better off using endpoints such as /observations or > /platforms. I just make sure that the observations or platforms > returned haven't come from private deployments. > > I think we need to pick a small handful of endpoints that we ALL support. > The obvious one being /observations. Then if we want to support > more then we can do so, trying our best to be consistent so that we don't > end up with one observatory using /deployments/ and another /deployment/. > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > , > or unsubscribe > > . >
SiBell commented 4 years ago

This is where pagination should come to the rescue. So that even if they make a request that would match millions of observations, we only return a maximum of 1000 (for example). Most databases should have some limit, offset and sort functionality to help with this.

What we haven't decided on yet is how we tell the user they have hit our maximum limit and presumably provide a URL for them to get the next 1000.

It would be very easy for me to support endpoints such /sensors//observation and /observedProperty//observations as well as /observations, so more than happy to add these to the MUST list.

It's worth saying that a single sensor could in theory upload thousands of observations everyday by itself, e.g. if it sampled every second, therefore we'd almost certainly need pagination on these additional endpoints too.

lukeshope commented 4 years ago

I worry we're going down the wrong path with a list of endpoints. A REST API shouldn't have a list of endpoints, because it's driven by hypermedia, meaning it doesn't matter what the web addresses are because you follow the links to get there.

There's absolutely nothing wrong with the endpoints you've suggested, it looks sensible as a way of implementation. But if I wanted to file my Platforms under https://api.example.com/silly-sausages then I should be able to.

We do need is some agreement on how we manage Collections (if we call them that... this might for example be a collection of platforms, thus paginated, as Si refers to) that aren't ObservationCollections. Examples of how we do that would be either hydra:Collection or rdf:Bag.

It's also entirely possible that you might have all your platforms in one API (a lamp post API, say) and all your sensors in another (an air quality API, say) and all your historic observations in another (an observation collection API, say) and they would all just link to each other.

We also need an entrypoint that directs clients to these collections as a starting point. In other words, when I hit https://api.example.com it gives me links to a collection of sensors, a collection of platforms, a collection of observations, etc. It wouldn't need to give me all of those necessarily, you might not have a collection of all observations from all sensors (which could be huge, but might be useful), you might only have collections of observations under each sensor.

In theory, this would/could look something like...

GET https://api.example.com/
{
  "@context": {
    "@base": "https://api.example.com/",
    "uo": "https://urbanobservatory.github.io/standards/vocabulary/latest/",
    "title": "http://purl.org/dc/terms/title",
    "collections": {
      "@id": "uo:EntrypointCollections",
      "@container": "@id"
    }
  },
  "collections": {
    "/sensors": {
      "@type": ["@id", "uo:Collection", "uo:SensorCollection"],
      "title": "All sensors available in Newcastle upon Tyne"
    }
  }
}

Is this discussion best split into a new issue? Not sure we're talking about filtering anymore...

EttoreHector commented 4 years ago

Thank you, Luke.

I agree we need an entry point and then just follow the links to get the resources we want. I guess I was assuming that the structure of the tree that stems from the entry point would be the same for all observatories. This is what I meant by "and agreed list of endpoints".

On Sun, 26 Jan 2020, 12:46 Luke Smith, notifications@github.com wrote:

I worry we're going down the wrong path with a list of endpoints. A REST API shouldn't have a list of endpoints, because it's driven by hypermedia, meaning it doesn't matter what the web addresses are because you follow the links to get there.

There's absolutely nothing wrong with the endpoints you've suggested, it looks sensible as a way of implementation. But if I wanted to file my Platforms under https://api.example.com/silly-sausages then I should be able to.

We do need is some agreement on how we manage Collections (if we call them that... this might for example be a collection of platforms, thus paginated, as Si refers to) that aren't ObservationCollections. Examples of how we do that would be either hydra:Collection https://www.hydra-cg.com/spec/latest/core/#collections or rdf:Bag https://www.w3.org/TR/rdf-schema/#ch_bag.

It's also entirely possible that you might have all your platforms in one API (a lamp post API, say) and all your sensors in another (an air quality API, say) and all your historic observations in another (an observation collection API, say) and they would all just link to each other.

We also need an entrypoint that directs clients to these collections as a starting point. In other words, when I hit https://api.example.com it gives me links to a collection of sensors, a collection of platforms, a collection of observations, etc. It wouldn't need to give me all of those necessarily, you might not have a collection of all observations from all sensors (which could be huge, but might be useful), you might only have collections of observations under each sensor.

In theory, this would/could look something like...

GET https://api.example.com/

{ "@context": { "@base": "https://api.example.com/", "uo": "https://urbanobservatory.github.io/standards/vocabulary/latest/", "title": "http://purl.org/dc/terms/title", "collections": { "@id": "uo:EntrypointCollections", "@container": "@id" } }, "collections": { "/sensors": { "@type": ["@id", "uo:Collection", "uo:SensorCollection"], "title": "All sensors available in Newcastle upon Tyne" } } }

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/urbanobservatory/standards/issues/18?email_source=notifications&email_token=AB6X6YJJ5VU5AGZUE3SQY63Q7WA4BA5CNFSM4JBXS5H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ5TBDI#issuecomment-578498701, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6X6YIGRUFXN5M5C3QLY6LQ7WA4BANCNFSM4JBXS5HQ .

SiBell commented 4 years ago

@lukessmith I've created a new issue on Collections and Pagination, as I agree it makes sense to start a new thread for this.

Being able to reach all the endpoints by following links makes perfect sense, but surely there's benefit to keeping some consistency between observatories? E.g. so that any scripts or front-ends that use an observatory's API would work just as well with other observatories' without having to change much more than the base url.

SiBell commented 4 years ago

At the risk of getting carried away, I have another two modifiers that would be useful:

SiBell commented 4 years ago

An __includes modifier would come in handy for selecting resources for which the provided item occurs with an array property.

For example an observation might have a flag property {flag: ['persistence', 'upperbound']}.

Then to query for all observations that have been flagged as breaching a climatic upper bound you can use:

/observations?flag__includes=upperbound

SiBell commented 4 years ago

I've found myself using a query parameter called search. E.g.

/platforms?search=lamppost

It behaves a little bit like the __contains except it searches across more than one field. In my case it will typically search both the id and the name for any keyword matches. Mentioning it in case it's something others see themselves using and therefore worthy of adding to the docs.

SiBell commented 4 years ago

Another addition, as discussed on the technical call today: not. For when we want to exclude something, or perform the opposite of a filter.

For example:

/observations?unit__not=uo:kelvin

Will exclude observations given in the unit Kelvin.

Another example:

/observations?resultTime__not__gte=2020-01-01

This would be the opposite of resultTime__gte. Although this is a bad example as we could just use resultTime__lt.

We'd also want to be able to provide a comma-separated list e.g:

/observations?unit__not=uo:kelvin,uo:fahrenheit

Although thinking about it, the right way to do this might be in combination with the __in modifier mentioned above, i.e.

/observations?unit__not__in=uo:kelvin,uo:fahrenheit

Because the __in modifier basically implies that the query parameter value will be an array.