Closed wwoast closed 4 years ago
Some animals will need some of their othernames
moved into nicknames
for this feature to work. Example: punk butt is not an accurate alternate form of Will Smith :laughing:
So, there still need to be functions that check Hiragana/Katakana equivalence -- not all pandas have both the hiragana and katakana versions of their name in the othernames
.
https://github.com/wwoast/redpanda-lineage/commit/7dee5e432149336983d0c0034bdf702a2876e4e2 implements the hiragana to katakana functions, and vice-versa. I'm less interested in super-smart behavior than just a basic swapping.
Now Hiragana searches for names not even in RPF will return if the corresponding Katakana name is in the dataset, and vice-versa. https://github.com/wwoast/redpanda-lineage/commit/45a76a2776be9dbf05891f8cedc3a861f59c4154
It will be trickier to implement the fully associative name matching. Imagine two animals with sets of names
/othernames
which have a single match somewhere. Each time I have an exact match for an animal, I need to repeat the search with the full set of names from the matching animal across all of RPF. Otherwise, I won't find all animals with one of those names
/othernames
in their own name sets.
In practice this hiragana/katakana swap has been good enough.
This idea started by wanting to support search equivalence for names in either Hiragana or Katakana -- so that searching for たいよう gives you results for タイヨウ as well.
But the better way to implement this is to have cross-language name equivalence. Effectively, any time a panda has
othernames
, I want to consider them equivalent for the sake of searching. So all searches for タイヨウ return hits that match any of タイヨウ'sothernames
(which will include たいよう, Taiyo, Taiyou, and others).One exception to this should be searching for Kanji names. In CJK, if you use a kanji name in a search, you're looking for a very specific animal since it's less likely that kanji names will be shared between animals. I think the right behavior is to search the
names
andothernames
, and return all hits for kanji names with similar pronunciations. In other words, if the set of pandas share some non-kanji name, but they have a kanji name result somewhere innames
orothernames
, they'll be returned as a match.Since the same kanji can be pronounced differently depending on the name, matching same-pronounciation kanji is equivalent to searching a kanji name and seeing two matching hits that have unique pronunciations. So this seems like more natural behavior than searching for exact kanji matches only.
I'll have to ask my Japanese friends what they think :heart: