pelagios / recogito2

Semantic Annotation Without the Pointy Brackets
Apache License 2.0
151 stars 30 forks source link

Text-to-TEI export enhancements #635

Open rsimon opened 5 years ago

rsimon commented 5 years ago
GusRiva commented 5 years ago

I would insert paragraph tags with one or more new line characters after a point. Regex: .\n+ That should cover most of the cases.

rsimon commented 5 years ago

I made a first/incomplete pass at this, which splits on \n\n only.

The issue I have is: I need to know how long the delimiting pattern is, so that I can correctly align the annotations within the paragraph. (Annotations on plaintext content are standoff markup inside Recogito, with every annotation recording the offset from text beginning.)

If you have any ideas on how to achieve this, let me know. I think there's something called "Lookahead"/"Lookbehind" which might achieve this (see this thread).