simonw / pge-outages

Tracking PG&E power outages
8 stars 2 forks source link

Missing outage information #4

Open simonw opened 8 months ago

simonw commented 8 months ago

Right now the power is off in El Granada:

https://pgealerts.alerts.pge.com/outage-tools/outage-map/

CleanShot 2024-02-04 at 16 35 38@2x

But the captured data only has 3 single customer outages in Half Moon Bay:

CleanShot 2024-02-04 at 16 35 18@2x

Why are the El Granada ones not being captured?

simonw commented 8 months ago

Currently fetching this:

https://ags.pge.esriemcs.com/arcgis/rest/services/43/outages/MapServer/5/query?where=1%3D1&outFields=*&f=pjson

Or without the format bit you get an HTML version:

https://ags.pge.esriemcs.com/arcgis/rest/services/43/outages/MapServer/5/query?where=1%3D1&outFields=*&

HTML version says 1,157 results which matches what we are capturing.

Is there pagination needed or something?

There are more URLs we could try listed here: https://ags.pge.esriemcs.com/arcgis/rest/services/43/outages/MapServer

simonw commented 8 months ago

Maybe I need this:

CleanShot 2024-02-04 at 16 40 44@2x

That's here: https://ags.pge.esriemcs.com/arcgis/rest/services/43/outages/MapServer/8/query?where=1%3D1&text=&objectIds=&time=&timeRelation=esriTimeRelationOverlaps&geometry=&geometryType=esriGeometryEnvelope&inSR=&spatialRel=esriSpatialRelIntersects&distance=&units=esriSRUnit_Foot&relationParam=&outFields=&returnGeometry=true&returnTrueCurves=false&maxAllowableOffset=&geometryPrecision=&outSR=&havingClause=&returnIdsOnly=false&returnCountOnly=false&orderByFields=&groupByFieldsForStatistics=&outStatistics=&returnZ=false&returnM=false&gdbVersion=&historicMoment=&returnDistinctValues=false&resultOffset=&resultRecordCount=&returnExtentOnly=false&sqlFormat=none&datumTransformation=&parameterValues=&rangeValues=&quantizationParameters=&featureEncoding=esriDefault&f=html

But the where 1=1 trick isn't working:

CleanShot 2024-02-04 at 16 41 59@2x

simonw commented 8 months ago

Oh interesting - there's a "return count only" option:

https://ags.pge.esriemcs.com/arcgis/rest/services/43/outages/MapServer/8/query?where=1%3D1&returnCountOnly=true&f=pjson returns

{
 "count": 3799
}
simonw commented 8 months ago

I requested "IDs only" and got back 3700 links to these:

https://ags.pge.esriemcs.com/arcgis/rest/services/43/outages/MapServer/8/3089?f=pjson

Each of those looks valid, and has a geometry attached.

My hunch is that the geometries are so big that trying to pull back 3799 in one go breaks something.

simonw commented 8 months ago

I tried "Return Extents Only" hoping I'd get a simple bounding box for each polygon, but it gives a single bounding box for the entire set instead.

https://ags.pge.esriemcs.com/arcgis/rest/services/43/outages/MapServer/8/query?where=1%3D1&resultOffset=0&resultRecordCount=20&f=pjson returns data - I set resultOffset=0&resultRecordCount=20 so I only got back 20 records.

Not sure how to reliably paginate though, if there's no cursor-based pagination I worry I'll miss stuff.

simonw commented 8 months ago

I don't trust https://ags.pge.esriemcs.com/arcgis/rest/services/43/outages/MapServer/8/3089?f=pjson either because:

{
 "feature": {
  "attributes": {
   "OBJECTID": 3089,
   "FID_Outage_Polygon": 1000,
   "FID_Outage_Polygon_1": null,
   "FID_Outage_Polygon_12": null,
   "OUTAGE_ID": 2341025,
   "InPoly_FID": null,
   "OUTAGE_START": 1707055688000,
   "CURRENT_ETOR": null,
   "EST_CUSTOMERS": 386,
   "OUTAGE_CAUSE": "STORM",
   "CREW_CURRENT_STATUS": "Awaiting T-Man",

In earlier explorations it looked like OBJECTID was not a stable identifier, unlike OUTAGE_ID - but the URL to this file uses OBJECTID in that URL.

simonw commented 8 months ago

Spotted "exceededTransferLimit": true in https://ags.pge.esriemcs.com/arcgis/rest/services/43/outages/MapServer/8/query?where=1%3D1&resultOffset=0&resultRecordCount=20&f=pjson - unintuitively this is what that means:

https://developers.arcgis.com/rest/services-reference/enterprise/query-map-service-layer-.htm

The exceededTransferLimit property is now included in the JSON response when paging through a query result with the resultOffset and resultRecordCount parameters. When exceededTransferLimit is true, it indicates there are more query results and you can continue to page through the results. When exceededTransferLimit is false, or this property is absent in a query result, it indicates that you have reached the end of the query results.

So it's actually a "is it worth continuing to paginate" property.

simonw commented 8 months ago

Here's what that GeoJSON looks like, after I first manually deleted the invalid top-level properties and exceededTransferLimit keys:

CleanShot 2024-02-05 at 08 49 23@2x

simonw commented 8 months ago

Confirmed, pagination via OUTAGE_ID > 2334267 works: https://ags.pge.esriemcs.com/arcgis/rest/services/43/outages/MapServer/8/query?where=OUTAGE_ID%3E2334267&orderByFields=OUTAGE_ID&resultOffset=0&resultRecordCount=20&outFields=*&f=geojson

simonw commented 8 months ago

Here's the problem: just 20 records from https://ags.pge.esriemcs.com/arcgis/rest/services/43/outages/MapServer/8/query?where=OUTAGE_ID%3E2334267&orderByFields=OUTAGE_ID&resultOffset=0&resultRecordCount=20&outFields=*&f=geojson is 473KB of data.

Right now https://ags.pge.esriemcs.com/arcgis/rest/services/43/outages/MapServer/8/query?where=1%3D1&returnCountOnly=true&f=pjson returns a count of 4605. If each record is 473/20 = 23KB on average, then 4605 records would be about 105,915KB - that's over 100MB!

That's way too much data for me to feel comfortable archiving every 10 minutes. Even 100MB a day feels like it would be uncomfortably large.