Closed kpister closed 3 years ago
Hi @kpister, Thanks for the question. The reported scores for sentence segmentation are F1 scores computed by using the official evaluation script of the CoNLL 2018 Shared Task, which is publicly available here . To run the evaluation script, the system and gold sentence segmentations should be organized into the CoNLL-U format. To get the Universal Dependencies datasets, you can download them here. Thanks.
Thanks for the response.
Very neat project! I am trying to understand the quality of trankit's sentence segmentation. I see the evals here: https://trankit.readthedocs.io/en/latest/performance.html#universal-dependencies-v2-5, but the results aren't very clear to me. Each column is simply a percentage, are these accuracy scores? F1-scores?
Additionally I'd like to run an eval against nltk's Punkt sentence segmentation model to see which I should use. Is the code that generate your evals public?
Thanks.