osmlab / name-suggestion-index

Canonical common brand names, operators, transit and flags for OpenStreetMap.
https://nsi.guide
BSD 3-Clause "New" or "Revised" License
712 stars 872 forks source link

Rethinking identifiers #3995

Closed bhousel closed 4 years ago

bhousel commented 4 years ago

I'd like to make some changes in the index to better support more kinds of named POIs and more granular locations. This is just a brain dump of some thoughts on how we track brands and the limitations with our current approach.

a long time ago

When this project started many years ago, each entry in NSI was just a unique name that we picked out of the OSM planet file. For example, here's what NSI looked like in 2017: https://github.com/osmlab/name-suggestion-index/blob/50993d4e7e8885a3b2312a1a633514b9a33db881/name-suggestions.json#L8621-L8623 This says that Target is an entry that with the name tag set to Target, and it has been used about 1000 times and sits under the shop/department_store hierarchy. iD would turn this into a preset that assigns the tags name=Target and shop=department_store.

Limitations of this:

So we did a lot of work on the NSI in 2018-2019 to arrive at the current format. I did a talk about it!

today

Each entry in NSI represents a unique "brand" identified by an string like: key/value|name~(disambiguator) The disambiguator part is optional and used in situations where distinct entities use the same literal name.

https://github.com/osmlab/name-suggestion-index/blob/50278e5b7ff226cea96d0bcecdc15442ec0b5c01/brands/shop/department_store.json#L730-L758

We solved the limitations from long ago, and NSI has really grown!

But now we have a new set of limitations:

So I'd like to think through how to rework the NSI entries to solve these limitations. More to come later...

UKChris-osm commented 4 years ago

When it comes to the name, I think it would be handy to have a way to mark this up as being "unique", after the brand has been added!

For entries like JD Wetherspoon (#3009) and Harvester (#4034) where the brand is strong, but the pub / restaurant itself is often uniquely named, rather than removing JD Wetherspoon or Harvester, keep them as they are, because they are strong brands that I feel people will type in first, but allow the NSI to flag to iD that the name is likely not "JD Wetherspoon" and has its own name, and this should be checked by the mapper.

I personally think it's better for a pub to be incorrectly named with the brand name as it's still how many people might refer to the pub, rather than have the brand not be suggested when someone searches, as they then may just add the brand them themselves, but may add it wrong, if that makes sense.

Meaning that even if the name is wrong, because it's named after the brand, at least that incorrect naming would be consistent, and so easier to find and correct, such as with a flag on the NSI guide web site that sees a branded entry with "JD Wetherspoon" as the name, and can flag it, rather than someone adding the name "Spoons", for example, and having that missed.

1ec5 commented 4 years ago

FYI, there are quite a few Wikidata items now linking to NSI using the NSI identifier property. But the items can be bulk-updated (along with the property constraints) based on whatever decision we arrive at here.

bhousel commented 4 years ago

Quick update on some identifiers work that I did last Friday:

{
  "brands/shop/craft": [
    {
      "displayName": "A.C. Moore",
      "id": "acmoore-286374",
      "locationSet": {"include": ["us"]},
      "oldid": "shop/craft|A.C. Moore",
      "tags": {
        "brand": "A.C. Moore",
        "brand:wikidata": "Q4647066",
        "brand:wikipedia": "en:A.C. Moore",
        "name": "A.C. Moore",
        "shop": "craft"
      }
    },
    {
      "displayName": "Hobby Lobby",
      "id": "hobbylobby-e90acf",
      "locationSet": {"include": ["in", "us"]},
      "oldid": "shop/craft|Hobby Lobby",
      "tags": {
        "brand": "Hobby Lobby",
        "brand:wikidata": "Q5874938",
        "brand:wikipedia": "en:Hobby Lobby",
        "name": "Hobby Lobby",
        "shop": "craft"
      }
    },
    {
      "displayName": "Hobbycraft",
      "id": "hobbycraft-ed2283",
      "locationSet": {"include": ["gb"]},
      "matchTags": ["shop/art"],
      "oldid": "shop/craft|Hobbycraft",
      "tags": {
        "brand": "Hobbycraft",
        "brand:wikidata": "Q16984508",
        "brand:wikipedia": "en:Hobbycraft",
        "name": "Hobbycraft",
        "shop": "craft"
      }
    },
...
bhousel commented 4 years ago

I'm almost finished with this work. 🎉

There will be a bunch of conflicts to resolve, so I've disabled merging to master temporarily until I can reconcile all the recent PRs that have been merged with the new file structure and code. Wish me luck.. 😅

bhousel commented 4 years ago

OK - the new files are merged in..

bhousel commented 4 years ago

The new code seems to be working pretty ok! This unblocks a bunch of other things that I'll tackle soon.

I'd like to leave this open until all those P8253 Name Suggestion Identifier properties on Wikidata have been updated. I'd ideally like to make this an automatic thing that the build_wikidata.js script does for us.

camelCaseNick commented 4 years ago

I'd like to leave this open until all those P8253 Name Suggestion Identifier properties on Wikidata have been updated. I'd ideally like to make this an automatic thing that the build_wikidata.js script does for us.

So not bulk uploading it now, to not prevent the testing of such an automation?

bhousel commented 4 years ago

So not bulk uploading it now, to not prevent the testing of such an automation?

I don't understand your question, sorry.. Mostly I just haven't bulk updated the P8253's yet because I ran out of time yesterday to implement it.

camelCaseNick commented 4 years ago

I don't understand your question, sorry..

I meant, that if I, or anybody else, would upload it in bulk now, it might interfere with your idea to update them automatically in the build_wikidata.js as you couldn't test it if every identifier was set correctly to the new one already?

bhousel commented 4 years ago

it might interfere with your idea to update them automatically in the build_wikidata.js as you couldn't test it if every identifier was set correctly to the new one already?

I don't think it would interfere - it's a one way update from NSI -> Wikidata. If you update them all now, the script would just have less to do later.

bhousel commented 4 years ago

I updated the build_wikidata.js script to push our new ids to wikidata and removed the legacy NSI ids. This is done 🎊

UKChris-osm commented 4 years ago

Great work @bhousel 👍

Can the new updates integrate into iD easily? Can a new release of the NSI be pushed to it anytime soon?

bhousel commented 4 years ago

Can the new updates integrate into iD easily? Can a new release of the NSI be pushed to it anytime soon?

I don't know, sorry. I was removed from the iD project and no longer maintain it.

Identitaet commented 4 years ago

Probably the best thing to do is just open an issue on the iD page and see what's the answer