timmahrt / praatIO

A python library for working with praat, textgrids, time aligned audio transcripts, and audio files. It is primarily used for extracting features from and making manipulations on audio files given hierarchical time-aligned transcriptions (utterance > word > syllable > phone, etc).
MIT License
299 stars 32 forks source link

empty textgrid #55

Closed lauredy closed 8 months ago

lauredy commented 1 year ago

Hello!

I'm trying to follow the tutorial of PraatIO: my objective is to segment my sentences (one file.wav = one recorded FRENCH sentence) and to extract the onset of words/syllabs/phonems.

The first example in the tutorial gave the word segmentation but when I try it, I obtain an empty textdgid :

I just want the labels from the entries

labelList = [entry.label for entry in wordTier.entries] print(labelList)

Get the duration of each interval

(in this example, an interval is a word, so this outputs word duration)

durationList = [] for start, stop, _ in wordTier.entries: durationList.append(stop - start)

print(durationList)

output: [] []

and actually, this is what I have in my file.textgrid:

File type = "ooTextFile" Object class = "TextGrid"

0 5.098503401360544

1 "IntervalTier" "words" 0 5.098503401360544 1 0 5.098503401360544 "" Did I miss something? maybe I didn't understand well the tutorial? Your help would be very valuable to me!!
timmahrt commented 1 year ago

Hello lauredy, I'm sorry for the hassle.

What does the start of your script look like?

It should look something like:

tg = textgrid.openTextgrid(join(path, "bobby_phones.TextGrid"), False)
wordTier = tg.getTier("words")

Does that look the same as what you have?

What is the output of print(wordTier.entries)?

If its not working, could you please send me your textgrid? Perhaps my script is having trouble processing it.

lauredy commented 1 year ago

thank you so much for your answer! :) (I'm also a beginner in python, it doesn't help!)

I used exactly the code you gave as example to create the textgrid and get the tiers:

inputPath = join("...", "file") outputPath = join(inputPath, "generated_textgrids")

for fn in os.listdir(inputPath): name, ext = os.path.splitext(fn) #sépare le nom du fichier de son extention if ext != ".wav": continue

the only wavefile in the "file" doc is named "f_00"

duration = audio.getDuration(join(inputPath, fn)) 
print(duration)
wordTier = textgrid.IntervalTier('words', [], 0, duration)

tg = textgrid.Textgrid()
tg.addTier(wordTier)
tg.save(join(outputPath, name + ".TextGrid"), format="short_textgrid", includeBlankSpaces=True)

for fn in os.listdir(outputPath): ext = os.path.splitext(fn)[1]

output: 5.098503401360544 (#the duration is the good one) f_00.TextGrid

and then:

inputFN = join("...", "file", "generated_textgrids","f_00.TextGrid") tg = textgrid.openTextgrid(inputFN, includeEmptyIntervals=False)

wordTier = tg.getTier('words') print(wordTier) print(wordTier.entries)

output: <praatio.data_classes.interval_tier.IntervalTier object at 0x000001D6C204EB00> ()

I send you the f_00.textGrid I obtained. f_00.TextGrid.txt

timmahrt commented 1 year ago

Ah, sorry I missed something in your first message. I looked at the input textgrid again:

and actually, this is what I have in my file.textgrid:

File type = "ooTextFile"
Object class = "TextGrid"

0
5.098503401360544

1
"IntervalTier"
"words"
0
5.098503401360544
1
0
5.098503401360544
""

The textgrid is empty--it has no entries in tiers. With your code, praatio will not include any intervals that have no text e.g. "".

If you want intervals that are empty, please change:

tg = textgrid.openTextgrid(inputFN, includeEmptyIntervals=False)

to

tg = textgrid.openTextgrid(inputFN, includeEmptyIntervals=True)

my objective is to segment my sentences (one file.wav = one recorded FRENCH sentence) and to extract the onset of words/syllabs/phonems.

Maybe this is obvious to you, but to make sure we are on the same page:

Praatio cannot detect words or phones in an audio recording.

If all of your textgrids are empty, you will need to fill them with data first (utterance-level, word-level, or phonetic-level transcripts). You can create those transcripts manually or you can use a tool like SPPAS (https://sppas.org/) to automatically generate textgrids filled with data (SPPAS requires the utterance-level transcripts however).

Once you have textgrids filled with data, you can use praatio to transform your textgrids into spreadsheets that be analyzed.

lauredy commented 1 year ago

Ok your explanations are very helpfull! Actually I have the utterance transcript. But maybe I was naive to think I will find a library on python to segment my sentences... I will have a look on SPPAS.

thank you again :)