Open stepps00 opened 7 years ago
Should this be the default (to not include them), but enable including them also possible to get the whole set of data (similar to Null Island flag)?
On Fri, Feb 10, 2017 at 12:11 PM, Stephen Epps notifications@github.com wrote:
Features with an edtf:deprecated date are not showing up in bundles, which is correct. However, features with an edtf:cessation date or mz:is_current value of 0 are showing up in bundles.
Since features with an edtf:cessation date, an edtf:deprecated date, and/or a mz:is_current value of 0 are no longer current on Who's On First, they should not be returned in bundler results.
For example, downloading all neighbourhood descendants in San Jose (link https://whosonfirst.mapzen.com/spelunker/download/85922347/?exclude=nullisland#12/37.3504/-121.9270) returns features with a edtf:cessation date.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/whosonfirst/whosonfirst-www-spelunker/issues/89, or mute the thread https://github.com/notifications/unsubscribe-auth/AA0EOyMrEG03KyNJ-JYLJdst7-XRw6_eks5rbMR0gaJpZM4L9zdD .
We should definitely remove features with a cessation or deprecated date for the default bundles. I'm open to having a check-box to enable bundling "all" features, if that's an option.
This should be a "sensible defaults" thing where unless the user/UI specifies that they want to include historical data the only current things are included.
Things that are deprecated should be excluded by default (which may already be the case... I don't remember?)
I would like to let the Bundler "bake" as-is a week or so, to see what else shakes out during that time, but we will schedule this as a 1.1 thing.
I'm looking into this one, and it does seem like something weird is going on. Still trying to pin down where in the process it's happening.
Here is an example query that illustrates the mz:is_current
behavior. That second result for Berryessa includes edtf:
properties, but not edtf:deprecated
which is set to 2017/02/21
.
curl -XPOST -s "https://whosonfirst-api.mapzen.com/?api_key=(snip)" -d "method=whosonfirst.places.getDescendants&id=85922347&placetype=neighbourhood&extras=edtf:,mz:&per_page=2&exclude=nullisland" | jq .
edtf:deprecated
for some reason?Oh and here are the API results:
{
"places": [
{
"wof:id": 85802971,
"wof:parent_id": "1108782553",
"wof:name": "Alviso",
"wof:placetype": "neighbourhood",
"wof:country": "US",
"wof:repo": "whosonfirst-data",
"edtf:cessation": "uuuu",
"edtf:inception": "uuuu",
"mz:tier_locality": "5",
"mz:is_funky": "0",
"mz:filesize": "49152",
"mz:max_zoom": "18",
"mz:is_landuse_aoi": "0",
"mz:is_hard_boundary": "0",
"mz:categories": [],
"mz:hierarchy_label": "1",
"mz:tier_metro": "1",
"mz:min_zoom": "14",
"mz:is_official": "0",
"mz:uri": "https://whosonfirst.mapzen.com/data/858/029/71/85802971.geojson"
},
{
"wof:id": 85805881,
"wof:parent_id": "1108782553",
"wof:name": "Berryessa",
"wof:placetype": "neighbourhood",
"wof:country": "US",
"wof:repo": "whosonfirst-data",
"edtf:cessation": "2017/02/21",
"edtf:inception": "uuuu",
"mz:tier_locality": "5",
"mz:is_funky": "0",
"mz:filesize": "61440",
"mz:max_zoom": "18",
"mz:is_landuse_aoi": "0",
"mz:is_current": "0",
"mz:is_hard_boundary": "0",
"mz:categories": [],
"mz:hierarchy_label": "1",
"mz:tier_metro": "1",
"mz:min_zoom": "12",
"mz:is_official": "0",
"mz:uri": "https://whosonfirst.mapzen.com/data/858/058/81/85805881.geojson"
}
],
"next_query": "method=whosonfirst.places.getDescendants&id=85922347&placetype=neighbourhood&exclude=nullisland&extras=edtf%3A%2Cmz%3A&per_page=2&cursor=(snip)",
"total": 600,
"page": null,
"pages": 300,
"per_page": 2,
"cursor": "(snip)",
"stat": "ok"
}
I updated Berryessa to use 2017-02-21
(and also mz:is_current
integer 0
instead of string "0"
) and that seems to have fixed the problem (i.e., it doesn't appear in API results, and thus no longer gets bundled).
Checking a few other records, Evergreen, Almaden Valley, and Blossom Valley I see a similar issue with edtf:deprecated
(all set to 2017/02/21
).
Upshot is I think explanation #1 is true: the API is in fact sensitive to mis-encoded EDTF values. Personally I prefer this behavior as-is, since it can help surface issues in the data.
@dphiffer there is now code to explicitly cast mz:is_current
as the correct data type in https://github.com/whosonfirst/py-mapzen-whosonfirst-export/commit/bb760b5be51e8e2c1557136317becc6c3671dbaa
There is also an update to the ES schema to not store those values are strings but it has neither been applied to the existing indices (that on deck...) nor have they been re-indexed yet: https://github.com/whosonfirst/es-whosonfirst-schema/commit/9a6a6959876f6484cf0646b16bdc5d5eeac7f0f7
@dphiffer FYI:
https://whosonfirst.mapzen.com/spelunker/recent/?&is_current=-1 https://whosonfirst.mapzen.com/spelunker/recent/?&is_current=0 https://whosonfirst.mapzen.com/spelunker/recent/?&is_current=1
I still need to add is_current
hooks to the API.
@dphiffer :
curl -X GET 'https://whosonfirst-api.mapzen.com?method=whosonfirst.places.search&api_key=mapzen-xxxxxx&is_current=1&name=wine&placetype=venue&format=json'
{
"places": [
{
"wof:id": 588378691,
"wof:parent_id": "85882195",
"wof:name": "Local Kitchen & Wine Merchant",
"wof:placetype": "venue",
"wof:country": "US",
"wof:repo": "whosonfirst-data-venue-us-ca"
},
{
"wof:id": 572153071,
"wof:parent_id": "85869177",
"wof:name": "Brooklyn Wine Exchange",
"wof:placetype": "venue",
"wof:country": "US",
"wof:repo": "whosonfirst-data-venue-us-ny"
}
],
"next_query": null,
"total": 2,
"page": 1,
"per_page": 100,
"pages": 1,
"cursor": null,
"stat": "ok"
}
The is_current
parameter does not consider anything other than the mz:is_current
property. In theory any record a non-empty / not-uuuu value should have an appropriate is_current property but that may not always be the case as of this writing.
Anyway, that is a separate issue as is handling edtf:
dates (for which I will file a separate ticket).
@dphiffer
curl -s -X GET 'https://whosonfirst-api.mapzen.com?method=whosonfirst.places.search&api_key=mapzen-xxxxxx&is_ceased=1&placetype=country&extras=edtf:&format=json'
{
"cursor": null,
"next_query": null,
"page": 1,
"pages": 1,
"per_page": 100,
"places": [
{
"edtf:cessation": "1991-06~",
"edtf:inception": "1945-11-29",
"wof:country": "XX",
"wof:id": 1126113567,
"wof:name": "Socialist Federal Republic of Yugoslavia",
"wof:parent_id": "102191581",
"wof:placetype": "country",
"wof:repo": "whosonfirst-data"
},
{
"edtf:cessation": "1945-11-29",
"edtf:inception": "1918-12~",
"wof:country": "XX",
"wof:id": 1108955789,
"wof:name": "Kingdom of Yugoslavia",
"wof:parent_id": "102191581",
"wof:placetype": "country",
"wof:repo": "whosonfirst-data"
},
{
"edtf:cessation": "1991-09~",
"edtf:inception": "1991-06~",
"wof:country": "XX",
"wof:id": 1108955787,
"wof:name": "Socialist Federal Republic of Yugoslavia",
"wof:parent_id": "102191581",
"wof:placetype": "country",
"wof:repo": "whosonfirst-data"
},
{
"edtf:cessation": "1992-03~",
"edtf:inception": "1991-09~",
"wof:country": "XX",
"wof:id": 1108955785,
"wof:name": "Socialist Federal Republic of Yugoslavia",
"wof:parent_id": "102191581",
"wof:placetype": "country",
"wof:repo": "whosonfirst-data"
},
{
"edtf:cessation": "2006",
"edtf:inception": "1992-03~",
"wof:country": "XX",
"wof:id": 1108955783,
"wof:name": "Federal Republic of Yugoslavia",
"wof:parent_id": "102191581",
"wof:placetype": "country",
"wof:repo": "whosonfirst-data"
}
],
"stat": "ok",
"total": 5
}
Note: The API does not currently support OR
queries so there is no way to query for the union of is_current=1
or is_ceased=0
(I mean that's the whole purpose of the mz:is_current
property but whatever...) only the interesection. One step at a time...
Features with an
edtf:deprecated
date are not showing up in bundles, which is correct. However, features with anedtf:cessation
date ormz:is_current
value of0
are showing up in bundles.Since features with an
edtf:cessation
date, anedtf:deprecated
date, and/or amz:is_current
value of0
are no longer current in Who's On First, they should not be returned in bundler results.For example, downloading all neighbourhood descendants in San Jose (link) returns features with a
edtf:cessation
date.