textlint-rule / sentence-splitter

Split {Japanese, English} text into sentences.
https://sentence-splitter.netlify.app/
MIT License
118 stars 14 forks source link

Splitter does not ignore certain cases of URLs/paths #22

Closed mghill closed 3 years ago

mghill commented 3 years ago

Brought over from https://github.com/IQTLabs/textlint-rule-one-sentence-per-line/issues/3 :

The example I came across was in a markdown doc, where textlint-rule-one-sentence-per-line (which uses sentence-splitter) was finding issue in the following snippet:

The Backoffice application will be available at:
https://{Application Load Balancer DNS NAME}/login?continue=/backoffice

It wanted to split on the ? character in the example URL. Further testing of normal prose in code also showed it was not skipped. I would expect a rule to ignore things that are addresses or paths, and to ignore things in code.

azu commented 3 years ago

Thanks for the report!

I would expect a rule to ignore things that are addresses or paths, and to ignore things in code.

Agree. It is a bug that split ? in code.

Probably, We need to add handling Code node into parsing logic. https://github.com/azu/sentence-splitter/blob/41b72341b3edf7ae853bcdd74146a66fe72ea41f/src/sentence-splitter.ts#L204-L210

→ Edit: Rule implementation has an issue: https://github.com/IQTLabs/textlint-rule-one-sentence-per-line/issues/3#issuecomment-739925995

azu commented 3 years ago

This issue related to https://github.com/IQTLabs/textlint-rule-one-sentence-per-line/issues/3 I've pointed out the reason in https://github.com/IQTLabs/textlint-rule-one-sentence-per-line/issues/3#issuecomment-739925995 So, I close this issue.