mohangk / c_sastrawi

A port of PHP Sastrawi to C
9 stars 3 forks source link

Test against reversed dictionary #3

Open herpiko opened 8 years ago

herpiko commented 8 years ago

https://raw.githubusercontent.com/herpiko/uji-sastrawi/master/reversed.json

Instead of testing against several random article, this json file (which is crawled from badanbahasa.kemdikbud.go.id/kbbi/index.php ) could be used as test reference.

{
    "kata" : "perlistrikan",
    "awalan" : "l",
    "lema" : "listrik",
    "penggalan" : [
        "per", "lis", "trik", "an"
    ]
}

kata should be stemmed to lema

But this json still need to be cleaned, some of them has invalid character like ? :

{
    "kata" : "taubat ? tobat",
    "awalan" : "t",
    "lema" : "taubat",
    "penggalan" : [
        "tau", "bat ? tobat"
    ]
}

I guess those words that splitted by ? are still debatable.

mohangk commented 8 years ago

@herpiko I think this is interesting. What do we need to do to start integrating this testing script into our code ?