Open simonw opened 8 months ago
Currently fetching this:
Or without the format bit you get an HTML version:
https://ags.pge.esriemcs.com/arcgis/rest/services/43/outages/MapServer/5/query?where=1%3D1&outFields=*&
HTML version says 1,157 results which matches what we are capturing.
Is there pagination needed or something?
There are more URLs we could try listed here: https://ags.pge.esriemcs.com/arcgis/rest/services/43/outages/MapServer
Oh interesting - there's a "return count only" option:
{
"count": 3799
}
I requested "IDs only" and got back 3700 links to these:
https://ags.pge.esriemcs.com/arcgis/rest/services/43/outages/MapServer/8/3089?f=pjson
Each of those looks valid, and has a geometry attached.
My hunch is that the geometries are so big that trying to pull back 3799 in one go breaks something.
I tried "Return Extents Only" hoping I'd get a simple bounding box for each polygon, but it gives a single bounding box for the entire set instead.
https://ags.pge.esriemcs.com/arcgis/rest/services/43/outages/MapServer/8/query?where=1%3D1&resultOffset=0&resultRecordCount=20&f=pjson returns data - I set resultOffset=0&resultRecordCount=20
so I only got back 20 records.
Not sure how to reliably paginate though, if there's no cursor-based pagination I worry I'll miss stuff.
I don't trust https://ags.pge.esriemcs.com/arcgis/rest/services/43/outages/MapServer/8/3089?f=pjson either because:
{
"feature": {
"attributes": {
"OBJECTID": 3089,
"FID_Outage_Polygon": 1000,
"FID_Outage_Polygon_1": null,
"FID_Outage_Polygon_12": null,
"OUTAGE_ID": 2341025,
"InPoly_FID": null,
"OUTAGE_START": 1707055688000,
"CURRENT_ETOR": null,
"EST_CUSTOMERS": 386,
"OUTAGE_CAUSE": "STORM",
"CREW_CURRENT_STATUS": "Awaiting T-Man",
In earlier explorations it looked like OBJECTID
was not a stable identifier, unlike OUTAGE_ID
- but the URL to this file uses OBJECTID
in that URL.
Spotted "exceededTransferLimit": true
in https://ags.pge.esriemcs.com/arcgis/rest/services/43/outages/MapServer/8/query?where=1%3D1&resultOffset=0&resultRecordCount=20&f=pjson - unintuitively this is what that means:
https://developers.arcgis.com/rest/services-reference/enterprise/query-map-service-layer-.htm
The exceededTransferLimit property is now included in the JSON response when paging through a query result with the resultOffset and resultRecordCount parameters. When exceededTransferLimit is true, it indicates there are more query results and you can continue to page through the results. When exceededTransferLimit is false, or this property is absent in a query result, it indicates that you have reached the end of the query results.
So it's actually a "is it worth continuing to paginate" property.
Here's what that GeoJSON looks like, after I first manually deleted the invalid top-level properties
and exceededTransferLimit
keys:
Confirmed, pagination via OUTAGE_ID > 2334267
works: https://ags.pge.esriemcs.com/arcgis/rest/services/43/outages/MapServer/8/query?where=OUTAGE_ID%3E2334267&orderByFields=OUTAGE_ID&resultOffset=0&resultRecordCount=20&outFields=*&f=geojson
Here's the problem: just 20 records from https://ags.pge.esriemcs.com/arcgis/rest/services/43/outages/MapServer/8/query?where=OUTAGE_ID%3E2334267&orderByFields=OUTAGE_ID&resultOffset=0&resultRecordCount=20&outFields=*&f=geojson is 473KB of data.
Right now https://ags.pge.esriemcs.com/arcgis/rest/services/43/outages/MapServer/8/query?where=1%3D1&returnCountOnly=true&f=pjson returns a count of 4605. If each record is 473/20 = 23KB on average, then 4605 records would be about 105,915KB - that's over 100MB!
That's way too much data for me to feel comfortable archiving every 10 minutes. Even 100MB a day feels like it would be uncomfortably large.
Right now the power is off in El Granada:
https://pgealerts.alerts.pge.com/outage-tools/outage-map/
But the captured data only has 3 single customer outages in Half Moon Bay:
Why are the El Granada ones not being captured?