ufal / ParCzech

ParCzech is a project on compiling Czech parliamentary data into annotated corpora.
https://ufal.mff.cuni.cz/parczech
0 stars 1 forks source link

Download files to valid TEI (not TEI-like format) #13

Closed matyaskopp closed 4 years ago

matyaskopp commented 4 years ago

Download files to valid TEI and than annotate and make them teitok processable

stranak commented 4 years ago

yes, so the future releases should – by my counting – include:

  1. raw original transcripts as downloaded
  2. valid TEI files, ideally including all the annotation, i.e. morphology, entities, audio alignment, ... whathever we have
  3. The TEI above without any semantic changes, only converted to the XML format of TEITOK.
matyaskopp commented 4 years ago

https://github.com/ufal/ParCzech/commit/a2fe5137214ff86c21a4a75e5a49e08625e2e9f4