textlint-rule / sentence-splitter

Split {Japanese, English} text into sentences.
https://sentence-splitter.netlify.app/
MIT License
117 stars 14 forks source link

Unable to break after full stop #18

Closed alxtsg closed 1 year ago

alxtsg commented 4 years ago

Hi. I have this piece of text and I want to split it into sentences:

An example of a bot that reverts vandalism on Wikipedia is ClueBot NG. ClueBot NG can revert edits, often within minutes, if not seconds.

Expected result:

Actual result:

I am using sentence-splitter version 3.1.0. Any idea why the text is not splitted after the first full stop?

azu commented 4 years ago

I think that it is a bug about https://github.com/azu/sentence-splitter/blob/master/src/parser/AbbrMarker.ts

AbbrMarker aim to support Full Stops (Periods) in Abbreviations. But, it maybe false-positive for NG.

torbsorb commented 2 years ago

We also experience this issue with ├── textlint-rule-en-max-word-count@2.0.0 └── textlint@12.2.1 It seems to fail any time the sentence ends with a capital letter. Simple reproducer:

import {
  split,
  Syntax
} from "sentence-splitter";

let sentences1 = split(`Sentence onE. Sentence two.`);
console.log(JSON.stringify(sentences1, null, 4));

Actual paragraph that causes issues:

Zivid Studio is the graphical user interface (GUI) for the Zivid SDK.
This allows the user to explore the functionality of the Zivid Cameras and the capturing of high definition 3D point clouds.

This was not a problem with: ├── textlint-rule-en-max-word-count@1.1.0 └── textlint@12.1.0

azu commented 1 year ago

This is fixed in https://github.com/azu/sentence-splitter/releases/tag/v3.2.3

https://sentence-splitter.netlify.app/#An%20example%20of%20a%20bot%20that%20reverts%20vandalism%20on%20Wikipedia%20is%20ClueBot%20NG.%20ClueBot%20NG%20can%20revert%20edits%2C%20often%20within%20minutes%2C%20if%20not%20seconds.