ufal / ParCzech

ParCzech is a project on compiling Czech parliamentary data into annotated corpora.
https://ufal.mff.cuni.cz/parczech
0 stars 1 forks source link

Align audio - merge data from synced verticals into TEI documents #120

Closed matyaskopp closed 3 years ago

matyaskopp commented 3 years ago

timeline //body/timeline

<timeline unit="s" origin="#audio.t0" corresp="#audio"> <!-- #audio points to //recording/media-->
  <when xml:id="audio.t0"  absolute="2015-04-08T08:58:00"/>
  <when xml:id="s1.tb" interval="30.934" since="#audio.t0" />
  <when xml:id="s1.te" interval="35.934" since="#audio.t0" />
  ...
</timeline>

Use anchors within sentences:

<u>
  <seg>
    <s xml:id="s1">
      <anchor synch="#s1.tb" />
      <w xml:id="s1.w1">First</w>
      <w xml:id="s1.w2">sentence</w>
      <anchor synch="#s1.te" />
    </s>
    <s xml:id="s2">
      <anchor synch="#s2.tb" />
      <w xml:id="s1.w1">First</w>
      <w xml:id="s1.w2">longer</w>
      <w xml:id="s1.w3">sentence</w>
      <anchor synch="#s2.te" />
    </s>
 </seg>
</u>
matyaskopp commented 3 years ago

Place <anchor.../> for all aligned words.

matyaskopp commented 3 years ago

Add timeline certainty: https://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-teidata.certainty.html

<timeline unit="ms" origin="#audio.t0" corresp="#audio" cert="medium"> 
  ...
</timeline>