rufuspollock / climate-negotiations

Information on the UNFCC climate negotiations using the Earth Negotiations Bulletin from the IISD
https://rufuspollock.github.io/climate-negotiations/
3 stars 0 forks source link

Text Chunking and Newlines #13

Open tommv opened 8 years ago

tommv commented 8 years ago

This important, but difficult to explain - ask me if I am not clear.

Two 'granularities' are relevant when chunking the ENB reports

  1. The first and coarser level is that of 'sections' corresponding to different tracks or formats
  2. The second and finer level is that of 'paragraphs' corresponding to different topics

Generally, ENB writers

While tagging the tracks and formats (more urgent), we should therefore take into consideration only the titles followed by a new line.

When we'll move to the topic tagging (less urgent), we should to take into consideration all titles.

Hope this help

rufuspollock commented 8 years ago

@tommv could you provide an example here from one of the reports - and link to the report online or in the repo?