meisyal / sastrawi-ruby

Sastrawi (Indonesian stemmer) bindings for Ruby.
https://rubygems.org/gems/sastrawi
MIT License
6 stars 4 forks source link

Unit tests are failed #6

Open meisyal opened 3 years ago

meisyal commented 3 years ago

If you check commit d0c6ae4, we still have six failing unit tests. These unit tests use default dictionary (Kateglo). Let's break them down one-by-one:

  1. Unit test failed to stem "-lah, -kah, -tah, -pun" suffixes

    This happened because the stemmer failed to stem "apatah" word. "apatah" word exists in default dictionary. So, it's considered a root word or no need to be stemmed.

  2. Unit test failed to stem "-i, -kan, -an" suffixes

    This happened because the stemmer failed to stem "belikan" word. The cause is the same with previous point.

  3. Unit test failed to stem loop last return of enhanced confix stripping

    This happened because the stemmer failed to stem "menerangi", "berimanlah", and "memuaskan" words. Need further investigation to find the cause.

  4. Unit test failed to stem modified enhanced confix stripping with infix

    This happened because the stemmer failed to stem "rerata" and "lelembut" words. These words exist in dictionary.

  5. Unit test failed to remove prefix recursively

    This happened because the stemmer failed to stem "kesepersepuluhnya" word. Need further investigation to find the cause.

  6. Unit test failed to stem adopted foreign suffixes

    This happened because the stemmer failed to stem "idealis" and "idealisme" words. Both exist in dictionary.

meisyal commented 3 years ago

To fix unit test number 1, we will replace the word "apatah", like "manatah" and "siapatah". Both of these words don't exist in default dictionary. Then, we will move "apatah" to unit test with custom dictionary.

meisyal commented 3 years ago

For unit test number 2, we will do the same like previous comment. "belikan" will be replaced with other word that doesn't exist in default dictionary. "abaikan", "hijaukan", and "ramaikan" are the examples.

"belikan" has two meanings. The first meaning is to buy something and the second meaning is a field in the forest. This Ruby gem can't distinguish homograph. Gem limitation should be documented later.

meisyal commented 3 years ago

For unit test number 4 and 6, we will move the words, "rerata", "lelembut", "idealis", and "idealisme", to unit test with custom dictionary as well.

meisyal commented 3 years ago

Progress checklist: