osm-search / Nominatim

Open Source search based on OpenStreetMap data
https://nominatim.org
GNU General Public License v3.0
2.98k stars 701 forks source link

Add USPS Standard Suffix Abbreviation #3414

Open mhsr21 opened 1 month ago

mhsr21 commented 1 month ago

Added USPS's Standard Suffix Abbreviation for postal addressing (https://pe.usps.com/text/pub28/28apc_002.htm)

mtmail commented 1 month ago

Some entries have the same source and destination text, e.g.

mhsr21 commented 1 month ago

@mtmail Some of the duplicates were present before my contribution--should I remove them altogether?

mtmail commented 1 month ago

@mhsr21 Would be great if you can remove the other duplicates, too. I see 42, and 41 of those are in the variants-en.yaml file.

cat settings/icu-rules/variants-* | perl -ne '/^\s+-\s+(.+?)\s+->\s+(.+)/ && $1 eq $2 && print' | wc -l
      42
mhsr21 commented 1 month ago

@mtmail Got rid of all duplicates in variants-en.yaml, plus the other one in variants-fr.yaml

mhsr21 commented 1 month ago

Thanks for going through this. I agree that we should have official abbreviations in this list.

On a more general note, the US abbreviation list has always had the problem that it is far too long. In particular, it has the problem that it proposes sometimes 3 or 4 variants for the same word. This has a negative effect on the size of the index. Would it make sense to restrict ourselves to the official abbreviations only or are the other ones are just as frequently used?

I only added the official abbreviations (the rightmost column on the linked website). Edit: I also fixed the typo.