timmahrt / praatIO

A python library for working with praat, textgrids, time aligned audio transcripts, and audio files. It is primarily used for extracting features from and making manipulations on audio files given hierarchical time-aligned transcriptions (utterance > word > syllable > phone, etc).
MIT License
311 stars 33 forks source link

Tier entries that have blank labels are not read #29

Closed stevebeet closed 2 years ago

stevebeet commented 3 years ago

I have many textgrid files with some or all the labels in specific tiers being deliberately set to be blank. Praatio just skips over them, ignoring the time information (which is what I need to retrieve). I have tried both a prebuilt version from "pip install" and the latest version from github, installed using setup.py

The attached file has three entries in its only tier, and two of them are empty. Only one is retrieved by praatio.

EmptyLabelBug.Txt

timmahrt commented 3 years ago

Hi, The functionality you described is there by (poor) design. What you want is to use is this:

openTextgrid(fn, readRaw=True)

http://timmahrt.github.io/praatIO/praatio/tgio.html#openTextgrid

You're not the first one to be confused by the default behavior and I'm working on a new version of PraatIO that will hopefully amend the situation (https://github.com/timmahrt/praatIO/pull/28).

Sorry for the headaches! Tim

timmahrt commented 3 years ago

Hi, I'm just checking in. Did my suggestion fix the problem for you?

I've been busy recently, so the fix for PraatIO might still be a few weeks away. Tim

SolomidHero commented 3 years ago

Hi! I faced the same problem and readRaw=True fixed it, so empty labels "" are visible

timmahrt commented 3 years ago

I'm glad readRaw=True fixed your problem. More user friendly behavior is still planned for Praatio 5.0. I'm gearing up to release it soon but haven't had time to make the final changes needed for it recently.

I still hope to have it out soon though.

timmahrt commented 3 years ago

Praatio 5.0 has now been released and includes a quality-of-life improvement for this issue.

openTextgrid has a new required argument includeEmptyIntervals, so the previous, default behavior should no longer be encountered:

textgrid.openTextgrid(
  fn=name,
  includeEmptyIntervals=False
)

I'm sorry this took so long to release.

If you are using Praatio 4.x and want to upgrade, changes are needed in your code. Please check the readme: https://github.com/timmahrt/praatIO#version-4-to-5-migration

And let me know if you have any questions. Thank you!

timmahrt commented 2 years ago

I'm going to close this issue now, since the issue has been fixed as far as I'm aware. Please feel free to reopen it if you have related questions or problems.