Open openstreetmap-trac opened 3 years ago
Author: spod [Added to the original trac issue at 2.36pm, Friday, 30th July 2010]
BTW: This problem occured, when using Safari browser with default encoding of "Western (ISO) Latin 1).I tried it in Firefox with English encoding and the same problem happened. Setting the default encoding to "Japanese (Shift-JIS)" in Safari made no difference.
Author: twain [Added to the original trac issue at 3.37pm, Friday, 30th July 2010]
Replying to [ticket:3149 spod]:
Searching for either from the OSM page or on the Nominatim site does not return the city node (331385074).
It seems to return various other items which do contain the substring "", but not the city node, which is strange. If it can find the substring "" within these items, you would expect it to be able to find the substring "" in the city node's name:ja which is "".
This is a combination of two problems. First I seem to be missing handlers for come strings so gets converted to by the code that handles abbreviations.
Would you be able to give me a literal translation of to help me understand what is happening? Is in a different character set or something?
Searching for "", which is the exact name:ja tag, seems to work. The "" means city, but wouldn't normally be used in a search (just like you wouldn't search for "Sheffield city", but just "Sheffield", in the UK).
This is the second part of the problem. I agree with the above but you will see that Sheffield isn't labelled as "Sheffield City" in osm:
http://www.openstreetmap.org/browse/node/422162
From a data point of view the extra seems wrong - it is already tagged 'place=city'.
?
-- Brian
Author: spod [Added to the original trac issue at 3.36pm, Saturday, 31st July 2010]
Thanks for the investigation.
The literal translation of "" is "Fukuoka RiverRain" which is the name of a shopping complex.
"" is kanji. "" is katakana (it's actually a phonetic character set generally used for 'foreign' words in Japanese). They are not different character sets as such, just different categories of Japanese "letters". In Unicode, the kanji (actually unified Chinese/Japanese etc symbols) code table starts from 4E00 and the katakana code table starts from 30A0, so I guess they are "separated" in a code sense (if the software is using Unicode encoding at the point of searching).
The inclusion of the "" (city) in the name tag is the convention in Japan: http://wiki.openstreetmap.org/wiki/Japan_tagging Not sure why they did this - I wasn't part of the discussions!
Author: spod [Added to the original trac issue at 2.58am, Sunday, 1st August 2010]
Some more info I thought of: If the software is using a change of code page to indicate "the start of a new word", then that's not always correct in Japanese. In the RiverRain example it does indicate the start of a new and separate word, but especially with Hiragana (another Japanese code page, starting at 3040) it is possible to have a single word containing kanji and hiragana.
e.g. (Oyafuko-dori), a road in Fukuoka city (way 43105756). The " " is kanji and the "" ("ri") is hiragana. Nominatim doesn't seem to return that way when searching for either the whole Japanese name, or any substring of it. Searching for the "English" name does work. I'll add that to the test cases page as well.
Author: spod [Added to the original trac issue at 3.21am, Sunday, 1st August 2010]
To clarify my last point:
The Japanese name (Oyafuko-dori), consists of 2 "words" (""/oyafuko and ""/dori/street"), with the second word being a mixture of kanji ("/do") and hiragana ("/ri"). i.e. it's a single, unsplittable word containing kanji and hiragana, which doesn't make sense if parsed by splitting it at the point where it changes from kanji to hiragana.
Reporter: spod [Submitted to the original trac issue database at 10.07pm, Wednesday, 28th July 2010]
Searching for either from the OSM page or on the Nominatim site does not return the city node (331385074).
It seems to return various other items which do contain the substring "", but not the city node, which is strange. If it can find the substring "" within these items, you would expect it to be able to find the substring "" in the city node's name:ja which is "".
http://nominatim.openstreetmap.org/search?q= should return node 331385074.
Searching for "", which is the exact name:ja tag, seems to work. The "" means city, but wouldn't normally be used in a search (just like you wouldn't search for "Sheffield city", but just "Sheffield", in the UK).
I tried searching for the Japanese city name of Yokohama () but that failed as well, so maybe a problem for all Japanese cities (except Tokyo, which is actually "incorrect" in that the OSM name:ja tag is "" when it should really be "", so ignore Tokyo in any tests to confirm whether it is fixed!).