knaw-huc / globalise-tools

tools for globalise tasks
Apache License 2.0
1 stars 1 forks source link

Use manually added paragraph data in conll export #1

Closed brambg closed 1 year ago

brambg commented 1 year ago

Currently, the sentence division is based on the sentences recognized by spacy. Make a new conll export whereby the paragraph endings as defined in globalise-word-joins-MH.csv are used, and the sentences as deduced by spacy are ignored

brambg commented 1 year ago

We're not using conll as import/export format anymore.