Closed santhoshtr closed 10 months ago
Which one are you using? I'm using spaCy in https://github.com/dpriskorn/riksdagen_sentences
It seems your segmenter is way faster than spaCy and also more accurate. :)
Currently sentencex is used for segmentation. See https://diff.wikimedia.org/2023/10/23/sentencex-empowering-nlp-with-multilingual-sentence-extraction/ Spacy does not support much languages.
The current sentence segmenter used in this project is a very minimal one. It has several limitations.
Replace it with https://github.com/santhoshtr/sentencesegmenter