osmandapp / OsmAnd

OsmAnd
https://osmand.net
Other
4.59k stars 1.01k forks source link

Address: Friedrich Wilhelm Weberstrasse Munster #14750

Open anten1222 opened 2 years ago

anten1222 commented 2 years ago

This is just an example address, any other navigation system will find the address immediately. Why not Osmand?!

Rowin63 commented 2 years ago

Nur so als Hinweis, "Weber" und "Straße" getrennt, dann wird die Adresse gefunden ;-)

Screenshot_20220705-095656_OsmAnd+

scaidermern commented 2 years ago

Probably related to #5409.

vshcherb commented 2 years ago

Problem is spacing "Weberstrasse" vs "Weber-strasse" vs "Weber strasse" "Friedrich-Wilhelm-Weber-Strasse Munster " - OK "Friedrich Wilhelm Weber Strasse Munster " - OK "Wilhelm-Weber-Strasse Munster " - OK "Wilhelm Weber Strasse " - OK "Wilhelm Weber Strasse" - OK "Wilhelm Weberstrasse " - Nothing "Weber Strasse " - Too much noise (not found) "Friedrich Wilhelm Weberstrasse Munster " - Nothing

sonora commented 1 year ago

Please note that per German orthography (§ 50 at https://www.duden.de/sprachwissen/rechtschreibregeln/strassennamen) the separations by "-" are mandatory in this case (Street names derived from combined first and last name require separation and hyphens). One might actually argue that it is no bug to not show a result in this case.

scaidermern commented 1 year ago

Please note that per German orthography (§ 50 at https://www.duden.de/sprachwissen/rechtschreibregeln/strassennamen) the separations by "-" are mandatory in this case (Street names derived from combined first and last name require separation and hyphens). One might actually argue that it is no bug to not show a result in this case.

I think this is irrelevant here. Many German users will ignore this rule by entering the street name without the hyphens. They still expect the search to succeed.

I'm sure users from other countries will use other simplifications when searching for their streets.

sonora commented 1 year ago

Yeah, possibly. That would infer we apply a "known words" approach to our address search by detecting if what we have in https://github.com/osmandapp/OsmAnd/blob/67de987df07da068d67acc338bb946df4cc07349/OsmAnd-java/src/main/java/net/osmand/binary/CommonWords.java#L6 is somehow part of what a user typed, and using it to prepare the canonical search string.

It can lead to quite unwanted effects, because often character sequences are part of other words without the fact that the corresponding word is actually contained per meaning ("cold" contains "old").

And:

A typical fuzzy search problem. Online search engines have huge CPU capacity and index data about user behavior, click counts etc. at hand. In the OsmAnd scenario the challenge is big enough to just produce all 'correct' results for what the user actually specified, within an acceptable time frame and in a useful order. Let alone working with variations of what the user could have meant...

It is not trivial to solve, and strongly related to #5921, #6643, #6233, #3086, and many others.

scaidermern commented 1 year ago

I'm wondering how search speed will be impacted by using something simple like Levenshtein distance to ignore minor typos, missing hyphens and so on.

sonora commented 1 year ago

I would hope to be wrong, but quite significantly, I"m afraid.

Levenstein leads to the field of "approximate string matching" (https://en.m.wikipedia.org/wiki/Approximate_string_matching).

To a first approximation I guess the effort increases at least proportionally (n-fold) with the number n of string variations to check against the search index. So if you have good heuristic knowledge about which string permutations to focus on (e.g. you know frequent typos or character swaps), n can maybe be kept small.

Problem seems that in an offline scenario perhaps without good such knowledge and simply including all possible variants even for just a small Levenstein, Hamming etc. distance, n may immediately be a rather big number...

scaidermern commented 1 year ago

So which options do exist?

This would resolve these kind of problems for most users. Users with online access can use online search with fuzzy matching. Users without Internet can fall back to a slow(er) offline fuzzy search if their keyword can't be found by the regular search function. Normal searching speed won't be affected as fuzzy search is only used on request.

Not a nice solution but at least it allows users to find what they are looking for. Not being able to find something is worse, I guess.