Closed hirokiky closed 3 years ago
I am also working on the subject and I agree with you that the best way is to introduce BaseForm information, even though it is required only by Japanese. How about like this?
LGTM! Thank you very much for the valuable contribution @hirokiky 🙏
Thanks for the quick review!
JapaneseBrokenExpression would cause false positives for potential verb. It would assert error for "食べられる", "見られる", "寝られる", and so on.
What I did:
About special case
The tokenizer will parse ”見れる" as one token. It might depends on dicts of Kuromoji or so.
This issue looks similar https://github.com/takuyaa/kuromoji.js/issues/28
I think we need to add other special cases (if there are).
About baseForm of tokens
It's better to use BaseForm in this logic
Best way.
But to get baseForm, we need to change TokenElement and NoelogdJapaneseTokenizer. It looks other Validators won't use BaseForm of each tokens, and it's only necessary with Japanese. So in this PullRequest, I avoided to change them.
As ScreenShot
Before
After
Note
I'm not good at Java, so feel free to change my code and syntax as you like.