Open rsimon opened 5 years ago
I would insert paragraph tags with one or more new line characters after a point. Regex: .\n+ That should cover most of the cases.
I made a first/incomplete pass at this, which splits on \n\n
only.
The issue I have is: I need to know how long the delimiting pattern is, so that I can correctly align the annotations within the paragraph. (Annotations on plaintext content are standoff markup inside Recogito, with every annotation recording the offset from text beginning.)
If you have any ideas on how to achieve this, let me know. I think there's something called "Lookahead"/"Lookbehind" which might achieve this (see this thread).
\n
(or\n\n
?)