nipunsadvilkar / pySBD

🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.
MIT License
802 stars 83 forks source link

Combination of single quotes prevent sbd #122

Open guydepauw opened 1 year ago

guydepauw commented 1 year ago

Describe the bug A text containing a particular combination of single quotes doesn't get segmented.

To Reproduce Steps to reproduce the behavior: Input text - Come work for us in 'S-Hertogenbosch. To ensure products meet specifications and standards, you will perform in-process inspection. The goal will be to make sure that production procedures will be carried on smoothly to maximize efficiency and profits. where will you work. COMPANY is a global leader in high-end server technology and innovation of IT products. There are also options to work abroad! apply.Are you interested in the position of production operator? Then apply directly via the ''apply'' button below.

Expected behavior A clear and concise description of what you expected to happen. Expected output - list of expected sentences

["Come work for us in 'S-Hertogenbosch. ", 'To ensure products meet specifications and standards, you will perform in-process inspection. ', 'The goal will be to make sure that production procedures will be carried on smoothly to maximize efficiency and profits. ', 'where will you work. ', 'COMPANY is a global leader in high-end server technology and innovation of IT products. ', 'There are also options to work abroad! apply.Are you interested in the position of production operator? ', "Then apply directly via the ''apply'' button below."]

Actual output:

["Come work for us in 'S-Hertogenbosch. To ensure products meet specifications and standards, you will perform in-process inspection. The goal will be to make sure that production procedures will be carried on smoothly to maximize efficiency and profits. where will you work. COMPANY is a global leader in high-end server technology and innovation of IT products. There are also options to work abroad! apply.Are you interested in the position of production operator? Then apply directly via the ''apply'' button below."]

Additional context Removing the first single quote or replacing the 2 single quotes with a double quote resolves the issue.