openstreetmap / trac-tickets

Archived Trac Tickets
1 stars 1 forks source link

In the search for street names in Spanish, stop words should be eliminated. #4895

Open openstreetmap-trac opened 3 years ago

openstreetmap-trac commented 3 years ago

Reporter: Emilio Gomez [Submitted to the original trac issue database at 1.47pm, Tuesday, 9th July 2013]

The Royal Spanish Academy indicates that the preposition "de'" (of, in English) should never be omitted in the names of streets, avenues and promenades, unless the name is an adjective: "Calle '''de''' Esproceda", "Plaza '''de''' Coln", "Avenida '''de''' Amrica", "Paseo 'de Gracia", in the first case, and "Calle Mayor" or "Plaza Nueva" for the second case.

But the right way to name a street is put the preposition de' (of) after of the type of road (examples: Calle '''de''' Alcal, Avenida '''de''' Prez Galdos, Plaza 'de Espaa, etc.), it's very common skip it when contracting the name. This peculiarity must be taken into account by search engines.

Nominatim not currently have this in mind, which makes the search engine not shows a lot of streets that actually exist in OpenStreetMap. Examples:

What makes also Nominatim unhelpful for geocoding reverse directions in this language.

Note that in Spanish besides the preposition de, another usual construct is to use el', '''la''', '''los''' or '''las''' (all translate to '''the''') just after de: Carretera '''del''' faro, Calle '''de los''' Cados '''de la''' Divisin Azul, Calle de las Descalzas. Note that "'''de el'''" contracts itself to "'del" in most cases.

In [http://snowball.tartarus.org/algorithms/spanish/stop.txt this link] there is a list of stop words that should be ignored in searches, which include the prepositions above.

openstreetmap-trac commented 3 years ago

Author: Emilio Gomez [Added to the original trac issue at 7.14pm, Wednesday, 10th July 2013]

Sorry, the links were in reverse:

openstreetmap-trac commented 3 years ago

Author: Cyrille37 [Added to the original trac issue at 5.40pm, Thursday, 15th August 2013]

Hi,

The same situation occurs with French. Many people stop using osm.org because it could not found many places, and this "de (of)" problem is in the top 10 raisons.

French example: "Prieur de Saint-Cosme, La Riche" => Find the correct place "Prieur Saint-Cosme, La Riche" => No result

Cheers. Cyrille.

openstreetmap-trac commented 3 years ago

Author: lonvia [Added to the original trac issue at 3.41pm, Monday, 30th September 2013]

Nominatim does eliminate a few common stop words, mostly articles. Extending this list is not simple for two reasons: first, eliminating stop words from one language can cause heaps of trouble in another language (or in the case of Nominatim, it causes tons of problems with ref values). Second, we can't just add new stop words to an existing database without transforming all existing names. Not completely unsolvable but needs some more thinking.

If you want to extend the list of stop words for your own installation, extend the list of replacements in the code [https://github.com/twain47/Nominatim/blob/master/module/nominatim.c#L251 around here].