openaq / openaq-api-v2

OpenAQ API
https://api.openaq.org
39 stars 9 forks source link

Features/v3 annual averaging #352

Open caparker opened 1 month ago

caparker commented 1 month ago

Pull request fixes issues for annual averaging but we also discussed updated endpoints as well, instead of using the period_name.

For the path updates I suggest the following sensors/1/measurements -- raw data for sensor 1 sensors/1/hours -- hourly data for sensor 1 sensors/1/days-- daily data for sensor 1 sensors/1/years -- annual data for sensor 1 sensors/1/measurements/hourly -- raw data aggregated to hourly for sensor 1 (really an alias for hourly data) sensors/1/measurements/daily -- raw data aggregated to daily for sensor 1, we dont store this so this would be novel sensors/1/days/yearly -- daily data for sensor 1 aggregated to a year The nice thing about above is that it would turn the period_name into another resource. The only thing I worry about for something like this is that the combination of the base/aggregation could be confusing but it would have its own endpoint and therefor we could offer a great description. Lets say that a user wants to see the annual average of daily values, it would be sensors/1/days/yearly which reads better to me than sensors/1/measurements/daily?period_name=year sensors/1/daily?period_name=year

russbiggs commented 1 month ago

A couple things I would like to work through and better understand:

This pattern feels a bit odd to me sensors/1/measurements/hourly this feel like it would more naturally be sensors/1/measurements?period=hourly, which is what we are moving away from obviously. I guess the double nested resources just looks foreign to me, I expect an id between, which obviously wouldnt make sense here.

I assume when you say "sensors/1/measurements/hourly is an alias hourly" for I am guessing you mean its functionally the is the same as sensors/1/hours. For hourly data this doesn't present as much of an issue but when looking at something like sensors/1/measurements/daily, is this also the same as sensors/1/days? (I assume not, based on our discussion I would guess the latter is based on hourly whereas the former will be the lowest granularity available for that sensor). To me this raises a question of what is the period used for each:

Then the question is do we need other options beyond our chosen defaults? i.e. do we need the options for both daily-annual average and monthly-annual average (and "raw" annual average etc.). The other question would be can we realistically deliver every granularity in a performant enough way? Currently hour of day summaries for a single year is finnicky in terms of being fast enough for the timeout.

caparker commented 1 month ago

We are trying to address two questions,

For the first, what table do they want to pull from I think it should look like this

sensors/1/measurements -- give me the raw measurements for sensor 1 sensors/1/hours -- give me the hourly aggregated data sensors/1/days -- give me the daily aggregated data sensors/1/years -- give me the annually aggregated data

The benefit of this is that we can have a different response model and filters for each resource.

For the second, how does the user want to aggregate the data, we have a few options, for example, lets say the user wants to take the raw measurements and aggregate them to daily values, we have the following options

sensors/1/measurements?aggregate=daily <--- treat it as a flag, same response shape sensors/1/measurements/daily <--- treat it as a resource, different response shape

My assumption would be that the response from this call would be shaped differently than the response from the sensors/1/measurements call and therefor it makes the most sense use the path method.

But if the response was not different I could see going the aggregate=daily route Here are the different combinations measurements > hourly measurements > daily measurements > annually hours > daily hours > annually days > annually years > ?

We also have the trends aggregates to think about hour of day - measurements, hours day of week - measurements, hours, days month of year - measurements, hours, days

caparker commented 2 days ago

There is still a little to do, like clean up queries and responses but I wanted to get this out for review just to make sure this is the right direction. You can get a feel for the new paths by looking at the tests.