nzilbb / labbcat-server

Server components for LaBB-CAT
GNU Affero General Public License v3.0
2 stars 0 forks source link

What's wrong with my EAF/TextGrid files? #40

Closed fishfree closed 4 months ago

fishfree commented 4 months ago

When I upload the ELAN eaf project file and mp4 video file, it showed errors as below: image Error text are as below:

No new parent available for [2b8]0#alphy(13.9-14.47) (word) but turn [null] missing)
No new parent available for [2bb]0#app(14.47-15.0) (word) but turn [null] missing)
No new parent available for [2be]0#subtitle(25.78-26.15) (word) but turn [null] missing)
No new parent available for [2bh]0#close(26.15-26.71) (word) but turn [null] missing)
No new parent available for [2bk]0#captions(26.82-27.05) (word) but turn [null] missing)
No new parent available for [2bn]0#unavailable(27.05-28.6) (word) but turn [null] missing)
No new parent available for [2bt]0#submit(31.91-32.34) (word) but turn [null] missing)
No new parent available for [2c2]0#右击(40.19-40.68) (word) but turn [null] missing)
No new parent available for [2c5]0#好了(50.32-50.72) (word) but turn [null] missing)
No transcripts were processed, because some have errors
l9eb4Qwz8o4.TextGrid: No parent for word:小伙伴 (0.54-0.81)
l9eb4Qwz8o4.TextGrid: No parent for word:ai (1.63-1.92)
l9eb4Qwz8o4.TextGrid: No parent for word:youtube (7.61-7.66)
l9eb4Qwz8o4.TextGrid: No parent for word:外嵌 (8.78-8.81)
l9eb4Qwz8o4.TextGrid: No parent for word:alphy (13.9-14.47)
l9eb4Qwz8o4.TextGrid: No parent for word:app (14.47-15.0)
l9eb4Qwz8o4.TextGrid: No parent for word:subtitle (25.78-26.15)
l9eb4Qwz8o4.TextGrid: No parent for word:close (26.15-26.71)
l9eb4Qwz8o4.TextGrid: No parent for word:captions (26.82-27.05)
l9eb4Qwz8o4.TextGrid: No parent for word:unavailable (27.05-28.6)
l9eb4Qwz8o4.TextGrid: No parent for word:丢到 (30.42-30.68)
l9eb4Qwz8o4.TextGrid: No parent for word:submit (31.91-32.34)
l9eb4Qwz8o4.TextGrid: No parent for word:5 (33.91-34.02)
l9eb4Qwz8o4.TextGrid: No parent for word:email (35.15-35.44)
l9eb4Qwz8o4.TextGrid: No parent for word:右击 (40.19-40.68)
l9eb4Qwz8o4.TextGrid: No parent for word:好了 (50.32-50.72)
l9eb4Qwz8o4.TextGrid: No parent for word:是不 (51.89-52.19)
l9eb4Qwz8o4.TextGrid: No parent for word:1万 (54.97-55.23)
l9eb4Qwz8o4.TextGrid: No parent for word:等着 (56.76-56.96)
l9eb4Qwz8o4.TextGrid: No parent for word:点进 (57.94-58.23)
main_participant - new: 1
participant - new: 1
turn - new: 6
utterance - new: 785
word - new: 225
Changed offsets: 2381

Is it due to Chinese characters? My files are as below: l9eb4Qwz8o4.zip You may unzip it and import the TextGrid file in ELAN as a tier.

robertfromont commented 4 months ago

For LaBB-CAT, transcripts must have a division of 'utterances' (i.e. groups of 5-15 words) with a speaker identified. The TextGrid has words and phones, but not utterances. To fix that:

  1. Add a new interval tier called utterance.
  2. On this tier, add labelled intervals that group the words into groups of 5-15 utterances - e.g. sentences, or breath groups. The labels should be the name/ID of the speaker. You need to ensure that no labelled interval on the word tier falls outside the bounds of the labelled intervals on the utterance layer - any word for which the utterance/speaker can't be identified will generate the kind of error you're seeing.
  3. When you upload the transcript, ensure the utterance TextGrid tier is mapped to the utterance LaBB-CAT layer, the word tier to the word layer, and the phones tier to the segment layer

There's more information about using TextGrids (and ELAN files) for transcripts here: https://nzilbb.github.io/labbcat-doc/howto/transcription/praat.html

fishfree commented 4 months ago

@robertfromont Thank you, Robert! I changed the mapping, now it works.