openaq / openaq-api-v2

OpenAQ API
https://api.openaq.org
39 stars 9 forks source link

sensors trends endpoint #335

Open russbiggs opened 4 months ago

russbiggs commented 4 months ago

Given that we've had some performance difficults with the /v3/locations/{locations_id}/trends/{measurands_id} endpoint and the recent migration to /v3/sensors/{sensors_id}/measurements I propose we restructure and deprecate /v3/locations/{locations_id}/trends/{measurands_id} in favor of something more like /v3/sensors/{sensors_id}/trends. This way we can take advantage of the sensors_id index like we are for measurements and provide a more consistent "sensor first" API overall.

The difference between /v3/sensors/{sensors_id} and the proposed /v3/sensors/{sensors_id}/trends may be very limited given that both have a Summary and Coverage objects. The main differentiating factor i see is separating query parameters for each endpoint that make more sense for each.

@caparker any thoughts?

caparker commented 4 months ago

I agree, deprecate the locations method in favor of the sensor first approach. As for the trends vs measurement issue I would love to see us combine them and just use sensors/:id/measurements for all uses but I am not sure we can.

The main difference, in my mind, between the two of them is the Period vs Factor, which maybe we can change.

For the measurements endpoint we allow for aggregating temporally, .e.g. day, month etc, in which case we can specify the period for each measurement returned. All measurement periods are mutually exclusive, i.e. no overlaps. An example period might be something like

label: monthly
interval:  1 mon
datetime_from: 2024-01-01
datetime_to: 2024-02-01

And for the trends endpoint we use the factor value, where a factor is temporal but now the measurements returned may be overlapping in period. And example would be dow or day of the week, where we group all mondays, tuesdays etc. Here is what the factor might look like for a query that only looked at january

label: dow
interval: 24:00:00
order: 1

So right now we keep these endpoints separate for this reason but I can imagine a use case where someone is looking at data and first they want to see daily values, so in R maybe it looks like this

data <- openaq.read('sensors/1/measurements?period=daily')

Which is saying, give me the measurements grouped by day. And then they are interested in grouping by the day of week

## so would this be better
data <- openaq.read('sensors/1/measurements?period=dow')
## or this
data <- openaq.read('sensors/1/trends?factor=dow')

I am worried that having to switch to a different endpoint to group by something different is worse than having the returned period be slightly different for the two types of grouping methods (non-overlapping temporal and overlapping temporal) . And so the more I think about this the more I think sticking with measurements is clearer.

So the period object for this query might look like this

label: day of week
interval:  dow <-- not a true interval though
datetime_from: 2024-01-01
datetime_to: 2024-02-01

And the datetimes (to and from) might either be blank or maybe we would need to calculate what the first and last monday would be? And/or maybe we add something to the period object to reflect that periods could be overlapping?