Open anten1222 opened 2 years ago
Nur so als Hinweis, "Weber" und "Straße" getrennt, dann wird die Adresse gefunden ;-)
Probably related to #5409.
Problem is spacing "Weberstrasse" vs "Weber-strasse" vs "Weber strasse" "Friedrich-Wilhelm-Weber-Strasse Munster " - OK "Friedrich Wilhelm Weber Strasse Munster " - OK "Wilhelm-Weber-Strasse Munster " - OK "Wilhelm Weber Strasse " - OK "Wilhelm Weber Strasse" - OK "Wilhelm Weberstrasse " - Nothing "Weber Strasse " - Too much noise (not found) "Friedrich Wilhelm Weberstrasse Munster " - Nothing
Please note that per German orthography (§ 50 at https://www.duden.de/sprachwissen/rechtschreibregeln/strassennamen) the separations by "-" are mandatory in this case (Street names derived from combined first and last name require separation and hyphens). One might actually argue that it is no bug to not show a result in this case.
Please note that per German orthography (§ 50 at https://www.duden.de/sprachwissen/rechtschreibregeln/strassennamen) the separations by "-" are mandatory in this case (Street names derived from combined first and last name require separation and hyphens). One might actually argue that it is no bug to not show a result in this case.
I think this is irrelevant here. Many German users will ignore this rule by entering the street name without the hyphens. They still expect the search to succeed.
I'm sure users from other countries will use other simplifications when searching for their streets.
Yeah, possibly. That would infer we apply a "known words" approach to our address search by detecting if what we have in https://github.com/osmandapp/OsmAnd/blob/67de987df07da068d67acc338bb946df4cc07349/OsmAnd-java/src/main/java/net/osmand/binary/CommonWords.java#L6 is somehow part of what a user typed, and using it to prepare the canonical search string.
It can lead to quite unwanted effects, because often character sequences are part of other words without the fact that the corresponding word is actually contained per meaning ("cold" contains "old").
And:
A typical fuzzy search problem. Online search engines have huge CPU capacity and index data about user behavior, click counts etc. at hand. In the OsmAnd scenario the challenge is big enough to just produce all 'correct' results for what the user actually specified, within an acceptable time frame and in a useful order. Let alone working with variations of what the user could have meant...
It is not trivial to solve, and strongly related to #5921, #6643, #6233, #3086, and many others.
I'm wondering how search speed will be impacted by using something simple like Levenshtein distance to ignore minor typos, missing hyphens and so on.
I would hope to be wrong, but quite significantly, I"m afraid.
Levenstein leads to the field of "approximate string matching" (https://en.m.wikipedia.org/wiki/Approximate_string_matching).
To a first approximation I guess the effort increases at least proportionally (n-fold) with the number n of string variations to check against the search index. So if you have good heuristic knowledge about which string permutations to focus on (e.g. you know frequent typos or character swaps), n can maybe be kept small.
Problem seems that in an offline scenario perhaps without good such knowledge and simply including all possible variants even for just a small Levenstein, Hamming etc. distance, n may immediately be a rather big number...
So which options do exist?
This would resolve these kind of problems for most users. Users with online access can use online search with fuzzy matching. Users without Internet can fall back to a slow(er) offline fuzzy search if their keyword can't be found by the regular search function. Normal searching speed won't be affected as fuzzy search is only used on request.
Not a nice solution but at least it allows users to find what they are looking for. Not being able to find something is worse, I guess.
This is just an example address, any other navigation system will find the address immediately. Why not Osmand?!