pelias / openaddresses

Pelias import pipeline for OpenAddresses.
MIT License
51 stars 43 forks source link

Portuguese/Spanish street name normalization #493

Open missinglink opened 2 years ago

missinglink commented 2 years ago

Following on from https://github.com/pelias/openaddresses/pull/477 we could probably tackle some Portuguese/Spanish street prefix/suffix contractions.

Mentioned in https://github.com/pelias/parser/issues/155#issuecomment-973143976 the pt/countrywide source of OpenAddresses contains contractions such as this (R GODINHO DE FARIA):

grep -i 'R Godinho De Faria' pt_addresses.csv | head -n1                     11s
pt.ine.add.PTCONT.3542119,R GODINHO DE FARIA,926,4465-151,SÃO MAMEDE DE INFESTA,-8.611103975015157,41.19918649220984
orangejulius commented 2 years ago

Yeah, if we can do this reliably, one character abbreviations would be a great candidate for expansion and normalization at import time.

I think r->Rua is fairly unambiguous in Portugal, and would be a great place to start.

Hopefully there aren't too many tricky ones. Carrer/Calle in Spain and Catalonia both are abbreviated by c, but sorting them out might be tough unless they are really strictly present in only certain regions.