nipunsadvilkar / pySBD

🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.
MIT License
802 stars 83 forks source link

XXXX et al. [2004] error #86

Closed GabrielLin closed 3 years ago

GabrielLin commented 3 years ago

Describe the bug Error segmentation

To Reproduce

import pysbd
text = "Yan et al. [2004] analysed SSH variations in northwest Europe and suggested that SSH changes are related to changes in heat content and heat fluxes."
seg = pysbd.Segmenter(language="en", clean=False)
print(seg.segment(text))

This is a whole sentence and should not be segmented.

nipunsadvilkar commented 3 years ago

Hey @GabrielLin this would be considered as an edge case and should be handled at a consumer end. If pysbd happens to break anywhere or gives destructed sentence then it's an issue to be resolved at developer end.

For above cases, you can consume pysbd's output and write your own rules on top of it. I hope this helps!

GabrielLin commented 3 years ago

OK. Thanks. I hope a function can be added to pySBD and it can deal with such a custom rule.