Closed alaa-eddine closed 3 years ago
Hello devs :) is there any update on this ? If you can give me some hints about the part of the code that parses numbers and sentences I can try to fix it.
Whoops!
Hi there! đź‘‹ Sorry for the wait!
This comes from: https://github.com/wooorm/parse-latin/blob/6e606a372cdec62e1a71cbf2cfb4d5ca40797622/lib/plugin/merge-prefix-exceptions.js#L12
Even in text, people often use numbers followed by a dot for “lists”s: it could be one of two things: 1. this, or: 2. that.
In those cases, the number + dot is not a break between sentences. So it is intentional, but I can see something in either option.
Hey @wooorm Thank you for your answer. Well I see the problem here, but the way it's implemented makes it difficult to fix without forking the repo + the dependencies. would it be possible to add a solution in retextjs to override those exceptions ?
How would you suggest to fix it? Because then it would break the other cases: numbers followed by periods in text?
Closing, natural language is really hard to classify with rule, and I can’t see how this could be fixed
Subject of the issue
I stumbled upon a strange case where retext fails to detect some sentences ending with number (but not all).
Your environment
Steps to reproduce
I try to parse the following string "Hello 30. Hello world."
Expected behaviour
output :
Actual behaviour
output :
Please note that if I test the same code with this string "Hello 3030. Hello world." , it works just fine.
it seems to happen with numbers with less than four digits .