openaddresses / machine

Scripts for running OpenAddresses on a complete data set and publishing the results.
http://results.openaddresses.io/
ISC License
97 stars 36 forks source link

Need a scraper for Beacon Schneider #580

Open iandees opened 7 years ago

iandees commented 7 years ago

There are several sources (in Minnesota, Missouri, and Wisconsin) that use Beacon Schneider as their GIS platform. Beacon Schneider doesn't have a great API like Esri does, but it should be possible to scrape them.

Their web app allows you to select data from layers, and some of the maps have an addresses layer. When you use the select tool, it makes an XHR that looks like this:

curl 'https://beacon.schneidercorp.com/api/beaconCore/GetVectorLayer?QPS=zg0f6QnaoIz-ILjoPSh5H9YcV9JnKIDzNg285gFzVrL28shpIWvHaC2_O7y-NcScTgL7ErXoqqD8xqmgIfsdD9VDBBpSSIgIMAaNlF08ZgthtQ2kCj3No9xlIoUzQXfVPwgbY0ViQudzzopRtOquRz-LqgkdUxoYP7O186lM2Oj5Rm777KDRTm-zlQ5DqhwsA7S1bni1WLpjbqXOGLYT2s5Z--ylbHaL2sRioEUw2RZAzPfqGw4N8aE4cd9K8twiz92WT5l3bR2gWtc2icpRcA2' \
    -H 'Content-Type: application/json' \
    --data-binary '{"layerId":13494,"useSelection":false,"ext":{"minx":1409946.25,"miny":292761.4375,"maxx":1411905.625,"maxy":294047.0625},"wkt":null,"spatialRelation":1,"featureLimit":0}'

Which gives back JSON-wrapped HTML with details for the records enclosed in the bounding box.

Unknowns:

migurski commented 7 years ago

"bs2geojson"

nvkelso commented 7 years ago

We're starting to see more of these Beacon Schneider systems in the wild.

migurski commented 7 years ago

The sample curl invocation above no longer works. What’s a current county with data published in this format?

iandees commented 7 years ago

Based on that, I imagine the QPS arg is a session token or something...

Here's how I got to this URL:

  1. Go to a Beacon client page like https://beacon.schneidercorp.com/Application.aspx?AppID=26&LayerID=155&PageTypeID=2&PageID=277 (from https://github.com/openaddresses/openaddresses/issues/2661)
  2. Click "Map"
  3. Enable the "Addresses" layer with the checkbox, zoom in close enough to see addresses
  4. Enable your browser developer tools
  5. Click "Selection tools", switch to "Select click/rectangle" mode, then click the "Addresses" layer name so the "i"/info icon is next to it (this is the layer you're selecting)
  6. Drag a box around some addresses and observe the network request
migurski commented 7 years ago

Thanks, makes sense! It’s a bit of hassle, but it seems possible to do a recursive geographic descent. I fiddled with some of the inputs, and this instance appears to have a 500 item response limit for this posted JSON:

{
   "layerId":13494,
   "useSelection":false,
   "ext": { 
      "minx":0,
      "miny":0,
      "maxx":40000000,
      "maxy":40000000
   },
   "wkt":null,
   "spatialRelation":1,
   "featureLimit":0
}
migurski commented 7 years ago

Started a thing in https://github.com/openaddresses/pybeacondump.

justinelliotmeyers commented 7 years ago

You guys are geniuses.

migurski commented 7 years ago

Okay so the code above works great on Taney, MO: https://github.com/openaddresses/openaddresses/pull/2844

The Beacon API is a bit half-assed, and uses mystery projected coordinates for bounding boxes and so forth. I'm not bothering to try to convert them because Lat and Long columns exist in the data source. For another county, this might have to change. What’s the next place to try this?

iandees commented 7 years ago

There's a bunch in https://docs.google.com/spreadsheets/d/1HFm0YbFDC5YKkHKFt89EKdhpyaCCIo_CiDFjBNq4PoI/edit#gid=0 (search for "beacon").

Try Page County, IA? https://beacon.schneidercorp.com/Application.aspx?AppID=220&LayerID=3039&PageTypeID=1&PageID=2857

migurski commented 7 years ago

Okay, harder. This one has very little data in the "result HTML" (right pane) though there does seem to be a way to get more by clicking on a point (bottom pane):

screen shot 2017-04-10 at 7 08 21 pm

There is no "Lat" or "Long" in the properties, which suggests we’ll need to figure out the projection used in order to take advantage of the WKT values.

migurski commented 7 years ago

I tried Iowa state plane projections and http://spatialreference.org/ref/epsg/3418/ looks close. Seems necessary to add client-side projection here.

migurski commented 7 years ago

Result geometries look good, but the properties are all blank because the HTML here is formatted differently. I have a feeling these are each the result of a bespoke consulting arrangement with a local tax assessor.

iowa

kirkedev commented 7 years ago

Hey @migurski did you figure this out?

I've been playing with their api as well. The projection they're using is State Plane Coordinates, so it's different for whatever state you're in. If you open the javascript console in your dev tools, there's a global called mapConfig that gets passed down in a script tag. It has the SRID in it. You'll also find your QPS parameter in there too, which does indeed seem to be a session token of some kind.

Example: image

psyon commented 6 years ago

using https://github.com/larsbutler/geomet to load the WKT

The make feature function becomes

def make_feature(record):
    ''' Get a complete GeoJSON feature object for a record.
    '''
    return dict(
        type='Feature',
        id=record.get('Key'),
        geometry=wkt.loads(record.get('WktGeometry')),
        properties=extract_properties(record)
        )

That made it handle the geometries for my counties without issue. Looks like you just had it set to create points prior to that.