Closed gonnavis closed 1 year ago
Found there are two reasons:
.
, even it is "line index", such as 1.
2.
3.
4.
\n
, even with sentenceSplitter.split(value, {SeparatorParser: {separatorCharacters: ['\n', `\\n`]}});
/^\d+\. /
appears to need to be interpreted as a syntax with special meaning.
Some words, such as Mr.
, have already been processed so that they are not sentence-breakers there.
I feel that 1.
will need similar treatment.
📝 In markdown, 1.
is list syntax.
splitAST
creates sentences from ASTs parsed from the Markdown parser, so there should be no problem. split
treat plain text and cause this issue.
Hello @azu , I have another question that, why ignore \n
totally?
Input, three lines of sentences without ending .
but with ending line breaker \n
:
We are talking about pens
He said "This is a pen. I like it"
I could relate to that statement.
Output, only one sentence:
We are talking about pens\nHe said \"This is a pen. I like it\"\nI could relate to that statement.
Is this repo only focusing on a single paragraph?
Oh, already parsed \n
, but seems it has lower priority than seperator .
Achieved by call close
with newline
:
But still can't understand why not split newLine
( \n
) first by default.
Is it in order to prevent split texts with wrongly returned sentence, such as
This is an
apple.
?
Hello @azu , made PR to not split line indexes https://github.com/textlint-rule/sentence-splitter/pull/36
Thanks! I'll look it in weekend.
Describe the bug
Sentences with "line indexes" split unexpectedly.
Text
Actual Result
For the fourth choice question on the JavaScript quiz, the choices are:\n\n1.
`
Throws an error\n2.Expected Result
For the fourth choice question on the JavaScript quiz, the choices are:
1. Throws an error
2. Ignores the statements
3. Gives a warning
4. None of the above
Feel free to let me know if you need help with the correct answer or if you have any other question!