openaq / openaq-api-v2

OpenAQ API
https://api.openaq.org
43 stars 9 forks source link

Bug: Pagination not working or found meta parameter incorrect #118

Closed saschahofmann closed 1 year ago

saschahofmann commented 1 year ago

If I understand it correctly a response like this shouldn't be possible:

{
    "meta": {
        "name": "openaq-api",
        "license": "CC BY 4.0d",
        "website": "api.openaq.org",
        "page": 2,
        "limit": 10000,
        "found": 2216436
    },
    "results": []
}

This result is produced by following this link https://api.openaq.org/v2/measurements?coordinates=42.698029%2C23.322718&radius=10000&page=2&limit=10000&parameter=pm25&order_by=datetime&sort=asc

I dont know whether the found parameter is wrong or pagination is working?

russbiggs commented 1 year ago

There are some known issues around pagination, which it looks like is the case here. we'll take a look and report back on any solutions.

saschahofmann commented 1 year ago

I scrolled a bit through the fastapi code and I can't find much use of the page parameter 😅 except putting it in the meta data.

Another maybe interesting observation. After a certain page number (way before it should), the server returns an internal server error instead of an empty response.

On the other hand, the first page returns over 11k results although the limit is set to 10k?

russbiggs commented 1 year ago

Yeah the pagination is quite a rabbit hole. The primary reason is issues in the results count (duplication) so when paging in some cases the math doesnt add up, i.e. the count shows more results than actually exist

caparker commented 1 year ago

This error (no results on page 2) comes from the way that the actual measurements are queried, which is shown in the following section https://github.com/openaq/openaq-api-v2/blob/044401873d4e7dff1a20bd4294a1ab964cc8a6ec/openaq_fastapi/openaq_fastapi/routers/measurements.py#L367-L434 Instead of querying all the data that meet the requested condition it breaks it down into different chunks of time but then limits the number of chunks it will do to 20. So in this specific instance, there is no data in the first 20 chunks of time that it queries. Also, the use of the offset in combination with the added range filter is troubling because it is asking for the second page of that specific intervals worth of data. It would potentially all work out IF the total duration of the data fit within the required 20ish chunks.

russbiggs commented 1 year ago

This has since been resolved in a previous release.