whosonfirst / whosonfirst-sources

Where things come from in Who's On First.
Other
21 stars 13 forks source link

Indicate original source of data (and via what aggregator) #40

Open nvkelso opened 7 years ago

nvkelso commented 7 years ago

Right now we have data from Quattroshapes which is actually originates from multiple difference sources. Each source needs to be credited, so we need a consistent WOF property to deal with this.

I propose a new property like src:via (was src_via originally) where the src should state the original source, and then we should credit the data aggregator in src:via as well.

Examples:

nvkelso commented 7 years ago

Related: https://github.com/whosonfirst/whosonfirst-sources/issues/39.

thisisaaronland commented 7 years ago

I would only change this to be src:via or and equivalent prefix + ":" + key pair, to be consistent with everything else.

nvkelso commented 7 years ago

Works for me :)

nvkelso commented 6 years ago

Seems like most the above applies to the whosonfirst-data repo.

To give credit to our src:via sources we'll also need to elevate some of the buried remarks (like for Quattroshapes) so they are listed directly in the big sources README so there is one page with all the sources on it for consumers of Who's On First data to link to in their apps for proper and good credit where credit is due.

All need to print out in a section under https://github.com/whosonfirst/whosonfirst-sources/blob/master/sources/README.md#quattroshapes

After license bullet point, a new paragraph with:

This source includes data from the following organizations:

With bullet points listed below, alphabetically eg:

And that list needs to be from a new JSON list in the quattroshapes.json source.

Ideally it could contain HTML text with hyperlinks (?) since I think we had problems with Markdown before.

nvkelso commented 6 years ago

The textual description part of this here in the sources repo is done.

Leaving this issue open as there is related work to followup about.

nvkelso commented 6 years ago

For this county in Tanzania:

Let's pretend it has the following properties:

We want to track generically the sources sources in predictable machine readable way, and in a way that doesn't need constant shuffling around as default and alt geoms are shuffled around, and without adding more sources JSONs, and making use of the existing "src:via" properties in the sources JSON we added recently. In this case Mesoshapes includes data from "TNBS" and let's pretend like quattroashapes includes data from "statscan".

NOTE: This new property would only be added in cases of WOF records where multiple sources exist for a source (eg Mesoshapes, Quattroshapes, and other *shapes sources), then all sources would be listed out in the extended format. Else no change if not multiple source sources.

We propose to add a new "src_via" prefix that accepts the same property names as src, but stores as list of lists (versus string for geom and list for geom_alt) because any one source can actually be composed of multiple sources:

Another example:

Then in the sources repo (this repo), modify the meso.json:

From:

"src:via" : {
            "context": "Tanzania",
            "source_link": "",
            "source_name": "Tanzania National Bureau of Statistics (TNBS)",
            "source_note": ""
        },

Add: "source_code": "tza_tnbs"

"src:via" : {
            "context": "Tanzania",
            "source_link": "",
            "source_name": "Tanzania National Bureau of Statistics (TNBS)",
            "source_code": "tza_tnbs",
            "source_note": ""
        },
nvkelso commented 6 years ago

Does this need to be a different structure?

"src_via:geom"={  
   "meso":[  
      "tza_tnbs"
   ],
   "naturalearth":[  
      "naturalearth"
   ],
   "quattroshapes":[  
      "statscan"
   ]
}

And should we riff on "src:via" ala "src_via" instead of "src_src"? (updated to src_via).

stepps00 commented 6 years ago

@nvkelso - the example in https://github.com/whosonfirst/whosonfirst-sources/issues/40#issuecomment-399602996 makes more sense.

nvkelso commented 6 years ago

Flagging @thisisaaronland for comments. We'd like to make this change next week.

thisisaaronland commented 6 years ago

With regards to the source_code key I would change it to source_prefix since that's what it is.

Likewise I would consider changing all the source_* keys to be src:* since the src prefix has historically been used as a pointer to "whosonfirst-sources".

src_via seems fine but I am not sure I understand why some of the examples have lists of lists, like this:

"src_via:geom" = [["meso:tza_tnbs"],["naturalearth"],["quattroshapes:statscan"]]

Like why wouldn't it just be:

"src_via:geom" = ["meso:tza_tnbs","naturalearth","quattroshapes:statscan"]
nvkelso commented 6 years ago

With regards to the source_code key I would change it to source_prefix since that's what it is.

👍

src_via seems fine but I am not sure I understand why some of the examples have lists of lists, like this:

That's because some sources include multiple sources so they need to be lists of lists.

thisisaaronland commented 6 years ago

Okay.

nvkelso commented 6 years ago

Likewise I would consider changing all the source_ keys to be src: since the src prefix has historically been used as a pointer to "whosonfirst-sources".

@stepps00 ⏫