Open ymoslem opened 2 years ago
Describe the bug Arabic sentence split on the Arabic comma.
To Reproduce Steps to reproduce the behavior:
import pysbd text = "هذه تجربة، للغة العربية" seg = pysbd.Segmenter(language="ar", clean=True) >>> print(seg.segment(text))
Output: ['هذه تجربة،', 'للغة العربية']
['هذه تجربة،', 'للغة العربية']
Expected behavior The text should not be split on the Arabic comma. Expected output: ['هذه تجربة، للغة العربية']
['هذه تجربة، للغة العربية']
Additional context I locally fixed it by modifying the file: pysbd/lang/arabic.py, deleting ، from SENTENCE_BOUNDARY_REGEX.
pysbd/lang/arabic.py
،
SENTENCE_BOUNDARY_REGEX
Describe the bug Arabic sentence split on the Arabic comma.
To Reproduce Steps to reproduce the behavior:
Output:
['هذه تجربة،', 'للغة العربية']
Expected behavior The text should not be split on the Arabic comma. Expected output:
['هذه تجربة، للغة العربية']
Additional context I locally fixed it by modifying the file:
pysbd/lang/arabic.py
, deleting،
fromSENTENCE_BOUNDARY_REGEX
.