Closed caparker closed 2 years ago
I think it boils down to the row_number window function shown here. This is the statement from the debugger, with some things we dont need commented out.
\set lon -122.658722
\set lat 45.512230
\set radius 5000
\set limit 10
\set offset 0
WITH t1 AS (
SELECT
id,
name,
--"sensorType",
--entity,
--"isMobile",
--"isAnalysis",
--city,
--country,
--sources,
--manufacturers,
--case WHEN "isMobile" then null else coordinates end as coordinates,
--measurements,
"firstUpdated",
"lastUpdated",
--json as "rawData",
--geog,
--bounds,
--parameters,
st_distance(st_makepoint(:lon,:lat)::geography, geog) as dist,
row_number() over () as row
FROM locations_base_v2
WHERE
st_dwithin(st_makepoint(:lon, :lat)::geography, geog, :radius) AND id not in (61485,61505,61506)
--ORDER BY st_distance(st_makepoint(:lon,:lat)::geography, geog) desc nulls last
LIMIT :limit
OFFSET :offset
),
nodes AS (
SELECT count(distinct id) as nodes
FROM locations_base_v2
WHERE
st_dwithin(st_makepoint(:lon, :lat)::geography, geog, :radius) AND id not in (61485,61505,61506)
),
t2 AS (
SELECT
*
--row,
--jsonb_strip_nulls(
-- to_jsonb(t1) - '{rawData,source_name,geog, row}'::text[]
--) as json
FROM t1
--group by row, t1, json
)
SELECT *
--nodes as count
--, json
FROM t2
--, nodes
ORDER BY row;
The row_number is not doing what is expected because the order is not explicitly provided to the window function This results in something that looks like this
id | name | firstUpdated | lastUpdated | dist | row
--------+---------------------------------------+------------------------+------------------------+---------------+-----
1064 | Portland - SE Lafaye | 2016-03-10 08:00:00+00 | 2022-06-27 14:00:00+00 | 4695.72085254 | 1
69252 | Eastmoreland | 2021-01-19 23:59:05+00 | 2021-10-28 15:29:50+00 | 4732.6373918 | 9
66710 | NE 54th Outdoor | 2021-01-20 00:00:09+00 | 2021-09-10 16:14:10+00 | 4671.45320776 | 11
68138 | SEL | 2021-01-19 23:59:08+00 | 2022-06-27 14:41:59+00 | 4692.63867283 | 17
64146 | Bridlemile | 2021-01-20 00:00:13+00 | 2022-06-27 14:41:31+00 | 4985.0287048 | 25
227197 | Eastmoreland | 2021-05-31 17:05:20+00 | 2022-06-27 14:41:21+00 | 4690.67850493 | 27
64112 | NE Regents Drive | 2021-01-19 23:59:24+00 | 2022-06-27 14:42:53+00 | 4717.71691655 | 48
235191 | 738 Sellwood Blvd, Portland, OR 97202 | 2021-09-08 18:33:22+00 | 2022-06-27 14:42:17+00 | 4695.70631895 | 62
270939 | Eastmoreland - Tolman | 2022-01-20 16:28:37+00 | 2022-06-27 14:41:35+00 | 4736.50913972 | 73
229739 | Fiver | 2021-07-17 23:14:12+00 | 2022-06-27 14:41:32+00 | 4845.62890587 | 74
(10 rows)
Where the row_numbers are obviously not what we want. Removing the order by row will fix this
id | name | firstUpdated | lastUpdated | dist | row
--------+----------------------+------------------------+------------------------+---------------+-----
1064 | Portland - SE Lafaye | 2016-03-10 08:00:00+00 | 2022-06-27 14:00:00+00 | 4695.72085254 | 1
277836 | Ladd's Addition | 2022-01-23 17:04:59+00 | 2022-06-27 08:32:33+00 | 765.36603846 | 2
230547 | Patton Rd | 2021-07-30 19:20:01+00 | 2022-05-30 23:26:11+00 | 4275.30362346 | 3
69822 | Ardea Apts 4th floor | 2021-01-19 23:59:24+00 | 2022-05-25 16:34:37+00 | 2006.54977501 | 4
232023 | Home | 2021-08-12 23:07:05+00 | 2022-03-25 21:30:59+00 | 4009.31860293 | 5
73435 | Healy Heights | 2021-01-19 23:58:56+00 | 2022-03-24 21:31:43+00 | 3973.01820498 | 6
229251 | 2871 SE Division St | 2021-07-09 15:55:44+00 | 2022-03-07 15:27:03+00 | 1941.87789715 | 7
64106 | NW PDX | 2021-01-19 23:59:54+00 | 2021-12-15 19:27:20+00 | 4215.64193465 | 8
69252 | Eastmoreland | 2021-01-19 23:59:05+00 | 2021-10-28 15:29:50+00 | 4732.6373918 | 9
235562 | NE 54th (Inside) | 2021-09-11 21:42:07+00 | 2021-09-12 00:13:38+00 | 4665.55044115 | 10
(10 rows)
Removing the order by row does not actually solve the issue because of the json work that is commented out in the statement above. When you look at the results of the full query, again, with some fields commented out, you can see that the json aggregator sorts by the first key
count | json
-------+---------------------------------------------------------------------------------------------------------------------------------------------
18 | {"id": 67465, "name": "PSU_GBRL_Home", "lastUpdated": "2021-12-08T20:22:05+00:00", "firstUpdated": "2021-01-19T23:59:20+00:00"}
18 | {"id": 70408, "name": "24th and Sherman", "lastUpdated": "2022-06-27T14:42:24+00:00", "firstUpdated": "2021-01-19T23:58:28+00:00"}
18 | {"id": 218844, "name": "Strand East", "lastUpdated": "2022-06-27T14:41:29+00:00", "firstUpdated": "2021-01-28T18:33:18+00:00"}
18 | {"id": 224865, "name": "Harrison West 26th floor", "lastUpdated": "2021-04-20T19:07:00+00:00", "firstUpdated": "2021-04-20T18:35:00+00:00"}
18 | {"id": 233545, "name": "Portland SE 12th and", "lastUpdated": "2022-06-27T14:00:00+00:00", "firstUpdated": "2021-08-26T21:00:00+00:00"}
18 | {"id": 273423, "name": "Grant and 24th", "lastUpdated": "2022-06-27T14:41:56+00:00", "firstUpdated": "2022-01-20T16:32:17+00:00"}
18 | {"id": 277490, "name": "Habitatgarden", "lastUpdated": "2022-06-27T14:41:56+00:00", "firstUpdated": "2022-01-20T16:29:39+00:00"}
18 | {"id": 277836, "name": "Ladd's Addition", "lastUpdated": "2022-06-27T08:32:33+00:00", "firstUpdated": "2022-01-23T17:04:59+00:00"}
18 | {"id": 290941, "name": "Kat", "lastUpdated": "2022-06-27T14:43:02+00:00", "firstUpdated": "2022-02-18T23:50:09+00:00"}
18 | {"id": 303816, "name": "SE Harrison & 27th", "lastUpdated": "2022-06-27T14:42:29+00:00", "firstUpdated": "2022-03-27T20:15:14+00:00"}
So in order to fix this so that we can still use the json aggregating method we will need to add the order by statement to the window function as well. For example
row_number() over (ORDER BY st_distance(st_makepoint(:lon,:lat)::geography, geog) asc nulls last) as row
This ensures that the data is sorted by distance and that we can recover this sorting after doing the aggregating later in the query.
When requesting data from the latest endpoint the results are returned in a seemingly arbitrary order
For example https://api.openaq.org/v2/locations?coordinates=45.512230,-122.658722&radius=2000&order_by=distance&limit=10&sort=asc