rufuspollock / climate-negotiations

Information on the UNFCC climate negotiations using the Earth Negotiations Bulletin from the IISD
https://rufuspollock.github.io/climate-negotiations/
4 stars 0 forks source link

Typographic tidying of text (questions) #10

Open rufuspollock opened 8 years ago

rufuspollock commented 8 years ago

This issue is about discussing ways the texts can be cleaned typographically

§ character

Have a lot of § characters e.g. our sample has:

::H1::§ § WORKING GROUP I§

Can § be deleted?

/cc @tommv

tommv commented 8 years ago

Yes if you don't see a use for them, just drop the §

rufuspollock commented 8 years ago

@pauloborges note this - we can include this in the cleaning. I would suggest at the start we are cautious and only strip from the start and end of lines.

pauloborges commented 8 years ago

@rgrp, I did some experiments about the § and I noticed a pattern. For example, if a . is followed by a § that means a newline in the original document. But the isolated §'s, for now, have no meaning for me.

tommv commented 8 years ago

Information about newlines is actually interesting and I definitely see a use for it (if it has been collected correctly - which I am not 100% sure of)

I opened a new issue to discuss this #13

pauloborges commented 8 years ago

Currently we're creating new paragraphs inside a ::BODY:: tag every time we find a sequence and deleting the remaining §.