Running Evals on Sentence Segmentation

nlp-uoregon / trankit

Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

Apache License 2.0

734 stars 101 forks source link

Running Evals on Sentence Segmentation #11

Closed kpister closed 3 years ago

kpister commented 3 years ago

Very neat project! I am trying to understand the quality of trankit's sentence segmentation. I see the evals here: https://trankit.readthedocs.io/en/latest/performance.html#universal-dependencies-v2-5, but the results aren't very clear to me. Each column is simply a percentage, are these accuracy scores? F1-scores?

Additionally I'd like to run an eval against nltk's Punkt sentence segmentation model to see which I should use. Is the code that generate your evals public?

Thanks.

minhhdvn commented 3 years ago

Hi @kpister, Thanks for the question. The reported scores for sentence segmentation are F1 scores computed by using the official evaluation script of the CoNLL 2018 Shared Task, which is publicly available here . To run the evaluation script, the system and gold sentence segmentations should be organized into the CoNLL-U format. To get the Universal Dependencies datasets, you can download them here. Thanks.

kpister commented 3 years ago

Thanks for the response.