Open rufuspollock opened 8 years ago
Yes if you don't see a use for them, just drop the §
@pauloborges note this - we can include this in the cleaning. I would suggest at the start we are cautious and only strip from the start and end of lines.
@rgrp, I did some experiments about the §
and I noticed a pattern. For example, if a .
is followed by a §
that means a newline in the original document. But the isolated §
's, for now, have no meaning for me.
Information about newlines is actually interesting and I definitely see a use for it (if it has been collected correctly - which I am not 100% sure of)
I opened a new issue to discuss this #13
Currently we're creating new paragraphs inside a ::BODY::
tag every time we find a .§
sequence and deleting the remaining §
.
This issue is about discussing ways the texts can be cleaned typographically
§ character
Have a lot of
§
characters e.g. our sample has:Can
§
be deleted?/cc @tommv