timmahrt / praatIO

A python library for working with praat, textgrids, time aligned audio transcripts, and audio files. It is primarily used for extracting features from and making manipulations on audio files given hierarchical time-aligned transcriptions (utterance > word > syllable > phone, etc).
MIT License
299 stars 32 forks source link

Some API suggestions for next major release #49

Open mmcauliffe opened 1 year ago

mmcauliffe commented 1 year ago

As I'm going through to make MFA compatible with v6.0, one thing that sticks out to me is that several aspects of PraatIO could be more pythonic and make it easier for integrating with other packages.

It also might be helpful to have includeBlankSpaces default to true for exporting to textgrids (and it might also be worth breaking up the TextGrid.save method into different methods for the output format), since it's a bit of a gotcha if you try to open up TextGrids in Praat later on. Not having blank intervals was the root cause of some issues for MFA in https://twitter.com/betsysneller/status/1621607813632892929, though I don't think they were using PraatIO to generate the textgrids.

timmahrt commented 1 year ago

Thanks for the feedback! I will read through it carefully later but skimming it, it sounds good. πŸ‘

timmahrt commented 1 year ago

As I'm going through to make MFA compatible with v6.0

I'm not sure if you saw, but I created a document with upgrading tips, if that helps! https://github.com/timmahrt/praatIO/blob/main/UPGRADING.md

timmahrt commented 7 months ago

Today I released version 6.2.

This adds the __iter__ and __len__ method as requested.

timmahrt commented 7 months ago

I tried to add __getitem__ and __setitem__.

Although I'm not active in using praatio anymore, when I did use praatio, I usually referenced my tiers by name, rather than position. If I only add __getitem__ to get by index then I can get a tier with either tg[0] or tg.getTier("phones"). It feels a little strange that the two methods work differently.

Maybe I could make them both accept either an index or a tier name? Or add a new method tg.getTierByIndex(0).

With __setitem__ isn't more of alias for replaceTier rather than addTier? At least for me, replaceTier isn't a method that I used too much.

What do you think?

timmahrt commented 7 months ago

it would also be nice to unify TextGrid.minTimestamp, TextGrid.maxTimestamp with IntervalTier.minT, IntervalTier.maxT

I plan on doing this in the next major release!

It might be good to split IntervalTier.insertEntry into three functions rather than have a string switch on its behavior, so something like IntervalTier.insertEntry, IntervalTier.mergeEntry, IntervalTier.replaceEntry

I will also plan on doing this in the next major release!

Use PEP8 snake case rather than camel case (e.g. Textgrid.tier_names instead of TextGrid.tierNames)

It's a big change that will break everyone's scripts (not that there are many users). It would be nice, but I'm not sure.

Use dataclasses for Intervals/Points rather than namedtuples so that they're mutable.

Someone complained that the data structures were mutable, so I changed everything to be immutable. πŸ˜… Some compromise like what you suggested might be doable. I'm not sure.

It also might be helpful to have includeBlankSpaces default to true for exporting to textgrids (and it might also be worth breaking up the TextGrid.save method into different methods for the output format), since it's a bit of a gotcha if you try to open up TextGrids in Praat later on.

includeBlankSpaces used to default to True but someone complained and so I made it required. I would agree that "True" should almost certainly be the default. The compromise was that the doc string says:

If you are unsure, True is recommended as Praat needs blanks to render textgrids properly.

^ Those aren't excuses, just the reason for the current behavior. πŸ™‡ I think everything is open to negotiation.