timmahrt / praatIO

A python library for working with praat, textgrids, time aligned audio transcripts, and audio files. It is primarily used for extracting features from and making manipulations on audio files given hierarchical time-aligned transcriptions (utterance > word > syllable > phone, etc).
MIT License
299 stars 32 forks source link

Add support for Elan exported textgrids and generally be more forgiving of whitespace #31

Closed mmcauliffe closed 3 years ago

mmcauliffe commented 3 years ago

Resolves #30

So added in the fixes to get parsing working for the Elan textgrids, added a version of the bobby_phones as well to test it. I did some slight refactoring of the loading code to lean on regular expressions, it should be simpler and faster.

One additional thing that I noticed is that the hasData bool isn't used, and there isn't any validation checks for whether it was correctly parsed. You might already be planning on that for 5.0, but it'd be helpful for debugging purposes if tgio would throw an error when it doesn't find any parse-able tiers but it's not a huge thing.

coveralls commented 3 years ago

Coverage Status

Coverage decreased (-0.04%) to 79.584% when pulling 5e3a2f818935ae1159f9a4c8d605f80d466ae283 on mmcauliffe:master into 056525d1f5a2c6e6337b64ea15134e571a231a4c on timmahrt:master.

timmahrt commented 3 years ago

Thanks for the PR! I will review this over the weekend.

There have been a lot of changes in Praatio 5--I don't think I've added a lot of validation to textgrid reading, so I can take a look into adding that as well.

Thanks!