openaddresses / openaddresses-ops

Issues-only repo for discussion of operational considerations for OA
6 stars 5 forks source link

Proposal: rename "type" tags in source schema #16

Closed migurski closed 4 years ago

migurski commented 7 years ago

We use the term "type" in two places to mean different things in our source schema:

  1. As a core tag:

    A string containing the protocol (One of: http, ftp, ESRI)

  2. As a processing tag:

    The type properties stores the format. It can currently be one of gdb, shapefile, shapefile-polygon, csv, geojson, or xml (for GML).

I’d like to propose that we phase out the use of "type" and replace it with two separate tags whose meanings will be more clear: "protocol" and "format". We can maintain backward compatibility for some period of time, but ultimately "type" should be deprecated and removed. A source like Juneau, AK currently looks like this:

{
    "coverage": { … },
    "data": "https://ftp.ci.juneau.ak.us/pub/CBJ_GIS_map_layers/juneau_alaska_gis_internet_parcel_layer.zip",
    "type": "http",
    "compression": "zip",
    "conform": {
        "type": "shapefile-polygon",
        "number": { … },
        "street": { … }
    }
}

It would instead look like this:

{
    "coverage": { … },
    "data": "https://ftp.ci.juneau.ak.us/pub/CBJ_GIS_map_layers/juneau_alaska_gis_internet_parcel_layer.zip",
    "protocol": "http",
    "compression": "zip",
    "conform": {
        "format": "shapefile-polygon",
        "number": { … },
        "street": { … }
    }
}
NelsonMinar commented 7 years ago

👍

Is protocol even necessary, or can it be inferred from the data URL?

migurski commented 7 years ago

In theory yes, though ESRI URLs will start with "http://" as well.

NelsonMinar commented 7 years ago

Oh right. That suggests the directive we want here is something like "download single file" vs "scrape ESRI endpoint". I may be overthinking your proposed change.

ingalls commented 4 years ago

This happened! :tada: