inaturalist / iNaturalistAPI

Node.js API for iNaturalist.org
https://api.inaturalist.org/
102 stars 29 forks source link

taxon autocomplete broken for some characters #3

Closed kueda closed 9 years ago

kueda commented 9 years ago

http://apibeta.inaturalist.org/taxa/autocomplete?q=%E5%8B%95%E7%89%A9%E7%95%8C

Seems like it might only be Kanji characters, b/c the following all work

pleary commented 9 years ago

Staging has an updated version of the code using a new Japanese analyzer https://github.com/elastic/elasticsearch-analysis-kuromoji . The analyzer is loaded as a plugin, and elasticsearch must be restarted in order to use it. This would mean a small amount of downtime in production to roll it out

kueda commented 9 years ago

Mostly working. Some weird exceptions

Is it only performing complete matches, i.e. there is no tokenization happening, not even on spaces and parens? I was surprised that 南方蘆蜂 did not match http://www.inaturalist.org/taxa/471606-Ceratina-cognata even though it has the name 南方蘆蜂 (南方蘆蜂)

pleary commented 9 years ago

I made a few changes. I was only using the filter for names with language Japanese, but it works for Chinese characters too, so now I'm using a regex to determine if a string should use the filter. The node app is using the same regex at query time to know what field to query. Do the results look any better now? The results might be a little more forgiving, so it would be worth re-testing the ones that worked before.

kueda commented 9 years ago

Much better, looks like all the examples I posted now work. Rad.

pleary commented 9 years ago

We had about 8 minutes of downtime tonight while I installed the kuromoji plugin and created the new fields for the taxon index. I then merged and released the node and Rails changes, and rebuilt the taxon ES index. Everything is live now and seems to be working.

kueda commented 9 years ago

Everything seems to be working for me. Yays. I contacted http://www.inaturalist.org/people/sunnetchan so hopefully he can do some more testing.