Open herpiko opened 8 years ago
https://raw.githubusercontent.com/herpiko/uji-sastrawi/master/reversed.json
Instead of testing against several random article, this json file (which is crawled from badanbahasa.kemdikbud.go.id/kbbi/index.php ) could be used as test reference.
{ "kata" : "perlistrikan", "awalan" : "l", "lema" : "listrik", "penggalan" : [ "per", "lis", "trik", "an" ] }
kata should be stemmed to lema
kata
lema
But this json still need to be cleaned, some of them has invalid character like ? :
?
{ "kata" : "taubat ? tobat", "awalan" : "t", "lema" : "taubat", "penggalan" : [ "tau", "bat ? tobat" ] }
I guess those words that splitted by ? are still debatable.
@herpiko I think this is interesting. What do we need to do to start integrating this testing script into our code ?
https://raw.githubusercontent.com/herpiko/uji-sastrawi/master/reversed.json
Instead of testing against several random article, this json file (which is crawled from badanbahasa.kemdikbud.go.id/kbbi/index.php ) could be used as test reference.
kata
should be stemmed tolema
But this json still need to be cleaned, some of them has invalid character like
?
:I guess those words that splitted by
?
are still debatable.