timmahrt / praatIO

A python library for working with praat, textgrids, time aligned audio transcripts, and audio files. It is primarily used for extracting features from and making manipulations on audio files given hierarchical time-aligned transcriptions (utterance > word > syllable > phone, etc).
MIT License
299 stars 32 forks source link

Wrong parameters when calling get_pitch_and_intensity.praat script from extractPI function #20

Closed caluap closed 3 years ago

caluap commented 3 years ago

I'm trying to use the pitch_and_intensity.extractPI function. It gives me a PraatExecutionFailed error.

When I paste the command that Python attempted to run, Praat complains that the optional argument “Unit” cannot have the value “True”. The output I'm getting is:

/Applications/Praat.app/Contents/MacOS/Praat --run /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/praatio/praatScripts/get_pitch_and_intensity.praat /Users/calua/google_drive/pessoal/estudo/mestrado/MSc_CaluaPataca/cc/extraction/praatio/piecewise_output/stand-up_002.wav /Users/calua/google_drive/pessoal/estudo/mestrado/MSc_CaluaPataca/cc/extraction/praatio/piecewise_output/stand-up_002.txt 0.01 123 450 0.03 True -1 -1 0 0

It doesn't seem to matter what unit I pass the function, although I got the feeling that in the pitch_and_intensity.py file, line 295, the pitchUnit parameter was left out!

timmahrt commented 3 years ago

Hello caluap, Thanks for reporting this issue.

I was able to replicate the issue and then fix it. I've deployed the fix in version 4.1.1, so if you update praatio you should be good to go.

If your segments are very short (word length or shorter) you will get errors from praat even with the fix. Praat needs a certain minimum window size to get good results. If your segments are phrases or longer, you'll be ok.

If you have short intervals, running extractPI without specifying the textgrid, then loading the textgrid and filtering the pitch data will give you better results. eg

from praatio import pitch_and_intensity
from praatio import tgio
maryPitchData = pitch_and_intensity.extractPI(join(wavPath, "mary.wav"),
                                              join(pitchPath, "mary.txt"),
                                              praatEXE, 75, 450,
                                              forceRegenerate=False)
tg = tgio.openTextgrid(join(tgPath, "mary.TextGrid"))
tier = tg.tierDict['phones']
filteredData = tier.getValuesInIntervals(maryPitchData)

or you can try an existing helper function that gets a bunch of predetermined pitch and intensity summary values (min, max etc) for each labeled interval

maryPitchData = pitch_and_intensity.extractPI(join(wavPath, "mary.wav"),
                                              join(pitchPath, "mary.txt"),
                                              praatEXE, 75, 450,
                                              forceRegenerate=False)
maryPitchSummaryData = pitch_and_intensity.generatePIMeasures(maryPitchData,
                                       join(tgPath, "bobby_words.TextGrid"),
                                       "word", doPitch=True,
                                       medianFilterWindowSize=9)
caluap commented 3 years ago

wow! above and beyond, huh?

Indeed, I was getting the error you mentioned. I had found the issue I reported, but as a quick fix I hardcoded “Hertz” in the Praat script which seemed to fix the problem only to uncover that I had some segments that were too short, as you imagined.

I tried your two ideas and both worked. Thank you!!

caluap commented 3 years ago

Hey,

your “min, max, etc” left me wondering what was in the etc, so I'll just register here what I eventually found. For pitch, the returned values are:

https://github.com/timmahrt/praatIO/blob/6312acb8648e948451cf78f9b683ecc4b9b6939a/praatio/pitch_and_intensity.py#L471