osmlab / name-suggestion-index

Canonical common brand names, operators, transit and flags for OpenStreetMap.
https://nsi.guide
BSD 3-Clause "New" or "Revised" License
705 stars 853 forks source link

Pub names #4926

Closed UKChris-osm closed 3 years ago

UKChris-osm commented 3 years ago

If I have this correct, the NSI started with just names, and no brand usage?

pub.json seems to have a great deal of British Pub names currently in it, such as Cross Keys, Cricketers, Red Lion, Victoria, etc.

This isn't unexpected, as I'm sure the planet scan picked these up, but as they are just names, and not necessarily brands, what would be the best way to handle these?

Remove the "brand" tag from each and keep the name, or filter them all out? The reason I ask is because I see no reason why they shouldn't be suggested as a name if someone starts typing "The Red" and is suggested "The Red Lion", but would this work if no brand / operator data exists either?

bhousel commented 3 years ago

Good questions! I changed some of this a few days ago (see #4906 and #4924) and need to update the contributing guide.

If I have this correct, the NSI started with just names, and no brand usage? pub.json seems to have a great deal of British Pub names currently in it, such as Cross Keys, Cricketers, Red Lion, Victoria, etc.

Yes the scripts originally just looked in the name tag, but we can collect and compare other tags now.
Also we had lists of which k/v pairs we would collect, and amenity/pub was probably not in the original list.

This isn't unexpected, as I'm sure the planet scan picked these up, but as they are just names, and not necessarily brands, what would be the best way to handle these?

Now we can attach properties to categories (#4906), I've added per-category exclude lists, so amenity/pub can be like fast food or restaurant categories..

in data/brands/amenity/fast_food.json we have this now:

    "exclude": {
      "generic": [
        "^(bistro|buffet|büfé|fast food|food court|kantine|frituur|imbiss|kiosk|lanchonete)$",
        "^(pizz(eri)?a|fish (and|&) chips|tacos)$",
        "^(бистро|пиццерия|столовая|ша(ве|у)рма)$",
        "^caf[eé](t[eé]r[ií]a)?$",
        "^d[oö]ner( kebab)?$",
        "^fri[tz]erie$",
        "^istanbul( kebab)?$",
        "^kebab( house)?$",
        "^snack(s)?( bar)?$",
        "^sushi\\s?(bar|house)?$",
        "^ラーメン(屋|店)?$"
      ],
      "named": [
        "^(ali baba|antalya|asia[ -](bistro|imbiss|wok)|(berlin|city) döner|city (grill|pizza)|kebabai|kfc/taco bell)$",
        "^(marmaris|pizza (house|time))$"
      ]
    }

These exclude lists (#4924) are lists of regular expressions.

They work like this:

So for pubs with common names that aren't brands, we can add them to the "named" section to exclude them from the index.

The reason I ask is because I see no reason why they shouldn't be suggested as a name if someone starts typing "The Red" and is suggested "The Red Lion", but would this work if no brand / operator data exists either?

This is one of those things that sounds like a good idea, and iD has had it for a while, but it is causing a lot of problems, so I'm trying to move away from autocompleting names: https://github.com/openstreetmap/iD/issues/8304 https://github.com/openstreetmap/iD/issues/8271 https://github.com/openstreetmap/iD/issues/6055

I think there's still a lot of value in having wikidata-backed brand presets for users to choose from, and have the validator suggest these where it makes sense, but autocompleting names is causing more problem and surprising the users.