Closed bhousel closed 4 years ago
When it comes to the name, I think it would be handy to have a way to mark this up as being "unique", after the brand has been added!
For entries like JD Wetherspoon (#3009) and Harvester (#4034) where the brand is strong, but the pub / restaurant itself is often uniquely named, rather than removing JD Wetherspoon or Harvester, keep them as they are, because they are strong brands that I feel people will type in first, but allow the NSI to flag to iD that the name is likely not "JD Wetherspoon" and has its own name, and this should be checked by the mapper.
I personally think it's better for a pub to be incorrectly named with the brand name as it's still how many people might refer to the pub, rather than have the brand not be suggested when someone searches, as they then may just add the brand them themselves, but may add it wrong, if that makes sense.
Meaning that even if the name is wrong, because it's named after the brand, at least that incorrect naming would be consistent, and so easier to find and correct, such as with a flag on the NSI guide web site that sees a branded entry with "JD Wetherspoon" as the name, and can flag it, rather than someone adding the name "Spoons", for example, and having that missed.
FYI, there are quite a few Wikidata items now linking to NSI using the NSI identifier property. But the items can be bulk-updated (along with the property constraints) based on whatever decision we arrive at here.
Quick update on some identifiers work that I did last Friday:
id
like "simplename-shorthash". displayName
is a separate thing, so we aren't relying so much on names and disambiguators. It can contain anything and can be used as the preset name in JOSM / iD presets.{
"brands/shop/craft": [
{
"displayName": "A.C. Moore",
"id": "acmoore-286374",
"locationSet": {"include": ["us"]},
"oldid": "shop/craft|A.C. Moore",
"tags": {
"brand": "A.C. Moore",
"brand:wikidata": "Q4647066",
"brand:wikipedia": "en:A.C. Moore",
"name": "A.C. Moore",
"shop": "craft"
}
},
{
"displayName": "Hobby Lobby",
"id": "hobbylobby-e90acf",
"locationSet": {"include": ["in", "us"]},
"oldid": "shop/craft|Hobby Lobby",
"tags": {
"brand": "Hobby Lobby",
"brand:wikidata": "Q5874938",
"brand:wikipedia": "en:Hobby Lobby",
"name": "Hobby Lobby",
"shop": "craft"
}
},
{
"displayName": "Hobbycraft",
"id": "hobbycraft-ed2283",
"locationSet": {"include": ["gb"]},
"matchTags": ["shop/art"],
"oldid": "shop/craft|Hobbycraft",
"tags": {
"brand": "Hobbycraft",
"brand:wikidata": "Q16984508",
"brand:wikipedia": "en:Hobbycraft",
"name": "Hobbycraft",
"shop": "craft"
}
},
...
I'm almost finished with this work. 🎉
There will be a bunch of conflicts to resolve, so I've disabled merging to master
temporarily until I can reconcile all the recent PRs that have been merged with the new file structure and code. Wish me luck.. 😅
OK - the new files are merged in..
master
to main
- update your local forks!dist/*
, so this will remain open until those are settled.The new code seems to be working pretty ok! This unblocks a bunch of other things that I'll tackle soon.
I'd like to leave this open until all those P8253 Name Suggestion Identifier properties on Wikidata have been updated. I'd ideally like to make this an automatic thing that the build_wikidata.js
script does for us.
I'd like to leave this open until all those P8253 Name Suggestion Identifier properties on Wikidata have been updated. I'd ideally like to make this an automatic thing that the
build_wikidata.js
script does for us.
So not bulk uploading it now, to not prevent the testing of such an automation?
So not bulk uploading it now, to not prevent the testing of such an automation?
I don't understand your question, sorry.. Mostly I just haven't bulk updated the P8253's yet because I ran out of time yesterday to implement it.
I don't understand your question, sorry..
I meant, that if I, or anybody else, would upload it in bulk now, it might interfere with your idea to update them automatically in the build_wikidata.js
as you couldn't test it if every identifier was set correctly to the new one already?
it might interfere with your idea to update them automatically in the
build_wikidata.js
as you couldn't test it if every identifier was set correctly to the new one already?
I don't think it would interfere - it's a one way update from NSI -> Wikidata. If you update them all now, the script would just have less to do later.
I updated the build_wikidata.js
script to push our new ids to wikidata and removed the legacy NSI ids.
This is done 🎊
Great work @bhousel 👍
Can the new updates integrate into iD easily? Can a new release of the NSI be pushed to it anytime soon?
Can the new updates integrate into iD easily? Can a new release of the NSI be pushed to it anytime soon?
I don't know, sorry. I was removed from the iD project and no longer maintain it.
Probably the best thing to do is just open an issue on the iD page and see what's the answer
I'd like to make some changes in the index to better support more kinds of named POIs and more granular locations. This is just a brain dump of some thoughts on how we track brands and the limitations with our current approach.
a long time ago
When this project started many years ago, each entry in NSI was just a unique name that we picked out of the OSM planet file. For example, here's what NSI looked like in 2017: https://github.com/osmlab/name-suggestion-index/blob/50993d4e7e8885a3b2312a1a633514b9a33db881/name-suggestions.json#L8621-L8623 This says that Target is an entry that with the name tag set to
Target
, and it has been used about 1000 times and sits under theshop/department_store
hierarchy. iD would turn this into a preset that assigns the tagsname=Target
andshop=department_store
.Limitations of this:
brand:wikidata
countryCodes
or any concept of where the brand was validSo we did a lot of work on the NSI in 2018-2019 to arrive at the current format. I did a talk about it!
today
Each entry in NSI represents a unique "brand" identified by an string like:
key/value|name~(disambiguator)
Thedisambiguator
part is optional and used in situations where distinct entities use the same literal name.https://github.com/osmlab/name-suggestion-index/blob/50278e5b7ff226cea96d0bcecdc15442ec0b5c01/brands/shop/department_store.json#L730-L758
We solved the limitations from long ago, and NSI has really grown!
brand:wikidata
and fetch a wealth of related data from the Wikidata project (like logos)countryCodes
for a while, nowlocationSet
which is even more flexibleBut now we have a new set of limitations:
brand
andbrand:wikidata
as the "key" even though other feature types might better be keyed off ofoperator
ornetwork
name
tag on everything, which causes issues for some brands (see Wetherspoons Pub #3009, but also this has caused issues with things like hotels and auto dealerships)So I'd like to think through how to rework the NSI entries to solve these limitations. More to come later...