poissonconsulting / fwapgr

An R Client for BC Freshwater Atlas Web API
https://poissonconsulting.github.io/fwapgr
Other
8 stars 1 forks source link

offset >= 100000 not accepted by API but 100001 is !!! #40

Closed sebdalgarno closed 2 years ago

joethorley commented 3 years ago

Yes this is from the API - I confirm its still happening with a test

joethorley commented 3 years ago
test_that("fwa_collection offset errors at 100,000", {
  collectionid <- "whse_basemapping.fwa_stream_networks_sp"

  expect_silent(fwa_collection(collectionid, offset = 99999, limit = 1))
  expect_error(fwa_collection(collectionid, offset = 100000, limit = 1),
               "Invalid value for parameter offset")
})

test_that("fwa_collection offset works at 100,001 !!!!", {
  collectionid <- "whse_basemapping.fwa_stream_networks_sp"
  expect_silent(fwa_collection(collectionid, offset = 100001, limit = 1))
})
joethorley commented 3 years ago

@smnorris - what do you think of this behaviour

joethorley commented 3 years ago

For now I've added a check on offset < 100000 because I'm not sure that the result returned with offset 100001 is valid!

smnorris commented 3 years ago

Behaviour is weird! I'll see if I can debug.

smnorris commented 3 years ago

Both of these seem to work? What are the exact requests that bail?

https://www.hillcrestgeo.ca/fwapg/collections/whse_basemapping.fwa_stream_networks_sp/items.html?limit=1&offset=100000

https://www.hillcrestgeo.ca/fwapg/collections/whse_basemapping.fwa_stream_networks_sp/items.html?limit=1&offset=100001

smnorris commented 3 years ago

Well, they provide a response anyway. The requests aren't very useful because they have to be ordered by something for the offset to work reliably.

Ordering by linear_feature_id should fix this... but the server returns a different feature each time below request is made, the orderBy doesn't seem to be working:

https://www.hillcrestgeo.ca/fwapg/collections/whse_basemapping.fwa_stream_networks_sp/items.html?orderBy=linear_feature_id&limit=1&offset=100000

This is the log output from pg_featureserv

INFO[0270] 127.0.0.1:36858 GET /collections/whse_basemapping.fwa_stream_networks_sp/items.json?orderBy=linear_feature_id&limit=1&offset=100000
DEBU[0270] Features query: SELECT ST_AsGeoJSON( ST_Transform( "geom", 4326)  ) AS _geojson , "linear_feature_id","watershed_group_id","edge_type","blue_line_key","watershed_key","fwa_watershed_code"::text,"local_watershed_code"::text,"watershed_group_code"::text,"downstream_route_measure","length_metre","feature_source"::text,"gnis_id","gnis_name"::text,"left_right_tributary"::text,"stream_order","stream_magnitude","waterbody_key","blue_line_key_50k","watershed_code_50k"::text,"watershed_key_50k","watershed_group_code_50k"::text,"gradient","feature_code"::text,"wscode_ltree"::text,"localcode_ltree"::text,"upstream_route_measure","upstream_area_ha" FROM "whse_basemapping"."fwa_stream_networks_sp"     LIMIT 1 OFFSET 10000;
DEBU[0270] Database query result: 1 rows in 383.705893ms
DEBU[0270] ---- Request complete in 385.779509ms
smnorris commented 3 years ago

The documentation I was referring to (https://access.crunchydata.com/documentation/pg_featureserv/latest/usage/query_data/) refers to the wrong parameter name, it should be sortBy, not orderBy - this is getting closer:

https://www.hillcrestgeo.ca/fwapg/collections/whse_basemapping.fwa_stream_networks_sp/items.html?sortBy=linear_feature_id&limit=1&offset=11000

The issue seems to be that the max offset value input to the query from the server is 10k. This is the log output from this request:

INFO[1020] 127.0.0.1:36904 GET /collections/whse_basemapping.fwa_stream_networks_sp/items.json?sortBy=linear_feature_id&limit=1&offset=11000
DEBU[1020] Features query: SELECT ST_AsGeoJSON( ST_Transform( "geom", 4326)  ) AS _geojson , "linear_feature_id","watershed_group_id","edge_type","blue_line_key","watershed_key","fwa_watershed_code"::text,"local_watershed_code"::text,"watershed_group_code"::text,"downstream_route_measure","length_metre","feature_source"::text,"gnis_id","gnis_name"::text,"left_right_tributary"::text,"stream_order","stream_magnitude","waterbody_key","blue_line_key_50k","watershed_code_50k"::text,"watershed_key_50k","watershed_group_code_50k"::text,"gradient","feature_code"::text,"wscode_ltree"::text,"localcode_ltree"::text,"upstream_route_measure","upstream_area_ha" FROM "whse_basemapping"."fwa_stream_networks_sp"   ORDER BY "linear_feature_id"   LIMIT 1 OFFSET 10000;
DEBU[1020] Database query result: 1 rows in 427.282033ms
DEBU[1020] ---- Request complete in 431.749204ms

I don't think this is a config issue, these config paging params shouldn't affect the offset?

[Paging]
# The default number of features in a response
LimitDefault = 10000
# Maxium number of features in a response
LimitMax = 10000
joethorley commented 3 years ago

These are the two queries: the first with an offset of 100001 returns an empty data set but the second with a smaller offset of exactly 100000 throws an error about the offset being too big.

I think the first query should also throw an error.

> fwa_query_collection(collection_id, offset = 100001)
Response [https://hillcrestgeo.ca/fwapg/collections/whse_basemapping.fwa_named_streams]
  Date: 2021-09-17 23:51
  Status: 200
  Content-Type: application/json
  Size: 1.18 kB

Response [https://hillcrestgeo.ca/fwapg/collections/whse_basemapping.fwa_named_streams/items.json?sortBy=fwa_stream_networks_label_id&limit=10000&offset=100001]
  Date: 2021-09-17 23:51
  Status: 200
  Content-Type: application/geo+json
  Size: 452 B
<BINARY BODY>
Simple feature collection with 0 features and 0 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS:  WGS 84
# A data frame: 0 × 1
# … with 1 variable: geometry <GEOMETRY [°]>
>  fwa_query_collection(collection_id, offset = 100000)
Response [https://hillcrestgeo.ca/fwapg/collections/whse_basemapping.fwa_named_streams]
  Date: 2021-09-17 23:51
  Status: 200
  Content-Type: application/json
  Size: 1.18 kB

Response [https://hillcrestgeo.ca/fwapg/collections/whse_basemapping.fwa_named_streams/items.json?sortBy=fwa_stream_networks_label_id&limit=10000&offset=1e%2B05]
  Date: 2021-09-17 23:51
  Status: 400
  Content-Type: text/plain; charset=utf-8
  Size: 42 B
Invalid value for parameter offset: 1e+05
Error: API request failed [400]: Invalid value for parameter offset: 1e+05
.
smnorris commented 2 years ago

Not totally sure what the issue is here - both these requests return an empty response:

https://www.hillcrestgeo.ca/fwapg/collections/whse_basemapping.fwa_named_streams/items.json?limit=1&offset=100000

https://www.hillcrestgeo.ca/fwapg/collections/whse_basemapping.fwa_named_streams/items.json?limit=1&offset=100001

Perhaps error noted above is an encoding issue in the request? Error: API request failed [400]: Invalid value for parameter offset: 1e+05

Empty responses for these requests are expected, there are only 11,767 records in that table but pg_fs doesn't know that.

To make paging work better I can add a record count meta table that the R client can query (https://github.com/smnorris/fwapg/issues/56), but hopefully it won't be necessary in future versions of pg_fs

joethorley commented 2 years ago

Hold off on the record count meta table for now - I'll let you know if I think we need it.

sebdalgarno commented 2 years ago

this was an issue on our end caused by conversion of numeric to character for the url string:

d> as.character(100000)
[1] "1e+05"
d> as.character(100001)
[1] "100001"

fixed now in pgfsr with format(x, scientific = FALSE)

joethorley commented 2 years ago

Hilarious!