openaddresses / machine

Scripts for running OpenAddresses on a complete data set and publishing the results.
http://results.openaddresses.io/
ISC License
97 stars 36 forks source link

OpenAddresses V2 Schema #691

Closed ingalls closed 6 years ago

ingalls commented 6 years ago

Add support for the V2 OpenAddresses Schema to machine while retaining backwards computability with V1

Approach

To Be Written

V2 Schema Discussion Ref: https://github.com/openaddresses/openaddresses-ops/issues/21

DC V2 Schema Branch Ref: https://github.com/openaddresses/openaddresses/pull/3923

cc/ @openaddresses/machinists

ingalls commented 6 years ago

Sample Source File I've been using for tests:

{
    "schema": 2,
    "coverage": {
        "country": "us", "state": "dc" },
    "layers": {
        "addresses": [{
            "name": "open-data",
            "type": "http",
            "data": "https://s3.amazonaws.com/data.openaddresses.io/cache/uploads/v2_test/Address_Points.zip",
            "compression": "zip",
            "conform": {
                "number": ["ADDRNUM", "ADDRNUMSUF" ],
                "street": [ "STNAME", "STREET_TYP", "QUADRANT" ],
                "type": "shapefile",
                "city": "CITY",
                "region": "STATE",
                "postcode": "ZIPCODE"
            }
        },{
            "name": "sheriff",
            "type": "http",
            "data": "https://s3.amazonaws.com/data.openaddresses.io/cache/uploads/v2_test/Address_Points.zip",
            "compression": "zip",
            "conform": {
                "number": ["ADDRNUM", "ADDRNUMSUF" ],
                "street": [ "STNAME", "STREET_TYP", "QUADRANT" ],
                "type": "shapefile",
                "city": "CITY",
                "region": "STATE",
                "postcode": "ZIPCODE"
            }
        }],
        "buildings": [{
            "name": "open-data",
            "type": "http",
            "data": "https://s3.amazonaws.com/data.openaddresses.io/cache/uploads/v2_test/Building_Footprints.zip",
            "compression": "zip",
            "conform": {
                "type": "shapefile"
            }

        }],
        "parcels": [{
            "name": "open-data",
            "type": "http",
            "data": "https://s3.amazonaws.com/data.openaddresses.io/cache/uploads/v2_test/Parcel_Lots.zip",
            "compression": "zip",
            "conform": {
                "type": "shapefile"
            }

        }]
    }
}
migurski commented 6 years ago

Could we skip the formatting commits for the time being? They make it hard to see what changes are actually happening.

ingalls commented 6 years ago

@migurski Fied, my vim setup turns these a very obnoxious red, I've disabled for now, ignoring muscle memory is hard :P

migurski commented 6 years ago

Thanks! I’m in favor of making these changes and setting up an editorconfig for this.

migurski commented 6 years ago

@ingalls, here are some answers to the questions you asked this morning when we talked.

Messages:

For the latest successful run on a source, probably ci.objects.read_latest_run is the most relevant function. We store successful and unsuccessful runs in the DB all together, and show the successful ones on the front page. For example, mx/agu/statewide broke a few weeks ago and we show that failure on the latest set, but we still make an effort to show available data on the front page.

migurski commented 6 years ago

Also, in case you’re poking around in the database, you’ll see that there are four named queues.

ingalls commented 6 years ago

Tracking

Heading out, things to pickup once I get back

Errors

  File "/usr/local/src/openaddr/openaddr/ci/work.py", line 147, in do_work
    with open(state_fullpath) as file:
FileNotFoundError: [Errno 2] No such file or directory: ''

Next Steps

ingalls commented 6 years ago

Per voice, going to take a different approach at this now that https://github.com/openaddresses/machine/pull/693 has landed.