openaddresses / openaddresses-ops

Issues-only repo for discussion of operational considerations for OA
6 stars 5 forks source link

Accepted: Authoritative Data #2

Open NelsonMinar opened 9 years ago

NelsonMinar commented 9 years ago

I've written up a proposal about a principle for OpenAddresses. I'd appreciate comments, edits, etc.

I'm motivated by a bunch of conversations I had at SotM US about the differences between OA and OSM. The key insight is that OSM is about letting users create their own map data. OA is about collecting existing address data from authoritative sources. Articulating those differences leads to some useful guidance.

migurski commented 9 years ago

Nice! I think we should do a couple laps figuring out a usable definition of authority for our purposes.

sbma44 commented 9 years ago

I think this is quite good, @NelsonMinar. Two reactions:

First, it might be worth adding additional clarification between authoritative and correct. I'm imagining adding an Open311 dataset of service addresses, for instance, which is not particularly systematic and may be prone to error. But it will come from a government authority.

We should not make lossy edits if they compromise the authority of the source data, or at least consider publishing the verbatim data along with our edited version.

I think we should distinguish between what OA publishes and what it catalogs. OA's work should absolutely be auditable/reproducible, which I think is part of what you're getting at here. Ideally we cache a copy of everything so that the pristine source data can always be retrieved. But I don't mind lossy transformations in the course of normalizing input into a useful output. (How much to normalize is a separate question; as discussed at SOTMUS, it's probably wise to use a lighter hand here).

migurski commented 9 years ago

To expand on authoritativeness, I’d be interested to get data from the ultimate source of assignations wherever possible, i.e. the municipality, county, etc. that actually issues them.

iandees commented 9 years ago

Yep, agree on that. For example, I'm really happy that we have Virginia's statewide dataset but I'd also be interested in having the local county's data too. We should always be searching for the local-est data possible.

NelsonMinar commented 9 years ago

Thanks for the responses! Is this generally the right idea? Is it useful?

Agreed that authority and correct are ambiguous. I used them that way deliberately. At first I smugly thought OA is authoritative in a way that OSM is not. But there's no particular reason to think any specific OA record is better; there's plenty of examples where OSM data is better than "official" data. I do think it's unique and important to our project that we're republishing data from many other sources and that the chain of authority on the data is valuable. If someone wants to take a crack on codifying that better than I have, I'd be grateful!

@sbma44 also grateful to your distinguishing between publishes and catalogs. I think it's valuable for OA to fulfill both roles, I'm just much more confident in our ability to catalog than to publish an edited, cleaned version. Mostly I just want to separate the two tasks clearly, right now they're muddled a bit in the output.