Open djahandarie opened 3 years ago
Thanks for the feedback!
One reason I use corpora is to find the right reading for a non-dictionary word. Ah, words that are likely to have furigana because they are not in a dictionary and natives would also need them? Can you give me a couple examples just for reference/testing?
In this initial version, I coded things quickly and meant to completely strip furigana and then maybe revisit them later and handle them properly. I see now that many are in there, but not all, per your example. So I'll bump that up the list.
And once that's done, I can see adding a checkbox per your suggestion.
For some examples, 夜闇 is listed as やあん in the dictionary, but this is often intended to be read as よやみ. 絹服 is unlisted in the dictionary, and furigana.info only shows けんぷく but this is often read きぬふく. Then you have things like 蛇王 which could be read へびおう or じゃおう but it'd be interesting to see the distribution. 豹頭 is often read ひょうとう but it's be nice to see if anyone ever gives it ひょうあたま. Basically any novel/rare compound is kinda flexible in its reading and it's useful to be able to look up what authors tend to intend.
First off, love the project, this is a wonderful idea.
One reason I use corpora is to find the right reading for a non-dictionary word.
Right now it's hard to use massif for the purpose, so it'd be nice if massif had a checkbox to only show results with furigana. (Something like the percentages on furigana.info would be a bonus but honestly not that important because I like to look through all the results individually anyways).
P.S. I've noticed that sometimes massif doesn't show the furigana for compound words. Eg search "枯れた魔術師" — the originals have furigana on all the hits but massif doesn't show it.