osmlab / name-suggestion-index

Canonical common brand names, operators, transit and flags for OpenStreetMap.
https://nsi.guide
BSD 3-Clause "New" or "Revised" License
702 stars 841 forks source link

Add `$` to some generic names the lack it #6098

Open JesseWeinstein opened 2 years ago

JesseWeinstein commented 2 years ago

There are two lines in Cyrillic in data/operators/amenity/hospital.json and data/brands/amenity/hospital.json that lack a $ at their ends. They are some of the very few lines in the NSI that do, and it looks like they've just stayed this way since they were first introduced back in this commit https://github.com/osmlab/name-suggestion-index/commit/f1df4ea6cf1a537cf309878850aa4c09afd759bb in 2019. I don't know the language, so it might be harmless -- but it also might be better to add the $ just for safety.

        "^инфекционн(ая|ое) (больница|отделение)",
        "^кожно-?венерологический диспансер",
JesseWeinstein commented 2 years ago

Also стоматолог in data/brands/amenity/dentist.json which was added (maybe intentionally without a $) in https://github.com/osmlab/name-suggestion-index/commit/4b9ecd41ad9b1510d9db1c5e39e7e8fd2dca8e50

JesseWeinstein commented 2 years ago

And in data/brands/shop/convenience.json and data/brands/shop/kiosk.json there's the line: "^მარკეტი( \\(market\\))?" which goes back all the way to https://github.com/osmlab/name-suggestion-index/commit/0cd082592d3976abf35fb10082149ed6090eb664 in 2018 -- I can't trace it further than that. We ... probably don't want to exclude all Georgian convenience stores or kiosks whose name merely starts with the Georgian word for market. But maybe we do?

JesseWeinstein commented 2 years ago

It looks like the exclusion of all Russian language pharmacies that start with аптека is intentional, as the trailing $ was explicitly removed in https://github.com/osmlab/name-suggestion-index/commit/38383e659731f93c15e8a594a1dbbd10b900b54e .

JesseWeinstein commented 2 years ago

The one other line missing a terminal $ is "^magazin\\s?(alimentar|mixt|non-stop)?" in data/brands/shop/convenience.json added in https://github.com/osmlab/name-suggestion-index/commit/760a3b00261cd964288cf9a8939619564c9b0b73 . Warning about anything starting with magazin doesn't seem ideal, but I don't speak the language, so I'm not sure.

Dimitar5555 commented 2 years ago

"^инфекционн(ая|ое) (больница|отделение)"

It's for infectious diseases hospital/department. Hospitals almost always have names but I don't know about departments. We can split it in two lines.

"^инфекционная больница",
"^инфекционное отделение$",

"^кожно-?венерологический диспансер",

This one means "Dermatological (and Venereological) Dispensary". Dispensaries also have names so no change is required.

JesseWeinstein commented 2 years ago

Thanks -- the question is whether anything that starts with these things is a bad name that should be flagged. It doesn't seem like that's correct for these.