timmahrt / praatIO

A python library for working with praat, textgrids, time aligned audio transcripts, and audio files. It is primarily used for extracting features from and making manipulations on audio files given hierarchical time-aligned transcriptions (utterance > word > syllable > phone, etc).
MIT License
299 stars 32 forks source link

`intersection`: issue on consecutive duplicate words #45

Closed scottmk closed 1 year ago

scottmk commented 1 year ago

Today I encountered an issue with the behavior of intersection.

Say I have a WORD tier that looks like this:

UPON | A | A | TIME

And I have a PHONE tier that looks like this:

AH0 | P | AA1 | N | AH0 | EY1 | T | AY1 | M

Assuming these are time-aligned correctly, when I call intersection, I get a list that looks something like this:

['UPON-AH0', 'UPON-P', 'UPON-AA1', 'UPON-N', 'A-AH0', 'A-EY1', 'TIME-T', 'TIME-AY1', 'TIME-M']

Because I have two intervals in the WORD tier which have the same label, from this intersection I can't really tell if I have two distinct words "A" that have the respective transcriptions "AH0" and "EY1", or if I have one distinct word "A" transcribed as "AH0 EY1".

Obviously, there is no right way to solve this, but I would suggest that since we do know that the word entries are distinct, that perhaps instead the label should be the WORD label plus a tuple of all the PHONE labels that coincide with it. Something like this:

['UPON-(AH0, P, AA1, N)', 'A-(AH0)', 'A-(EY1)', 'TIME-(T, AY1, M)']

This would also mean that the interval boundaries would be the boundaries of the left-hand side tier. So my example would be for

word_tier.intersection(phone_tier)

If you instead did

phone_tier.intersection(word_tier)

you would get

['AH0-UPON', 'P-UPON', 'AA1-UPON', 'N-UPON', 'AH0-A', 'EY1-A', 'T-TIME', 'AY1-TIME', 'M-TIME']

What do you think?

scottmk commented 1 year ago

Alternatively, it could behave as it currently does, and instead you could add the entryList entry index from each tier to the intersection entries.

timmahrt commented 1 year ago

I think you first solution sounds reasonable. I'll take a look and see what I can do.

timmahrt commented 1 year ago

When I went to implement the changes, I realized that my original intention with intersection() wouldn't be compatible with the changes you suggested.

For example, what would happen for a phone list, where only some of the phones are listed for each word, e.g. [(0, 1, "hello")] and [(0.1, 0.2, 'e'), (0.7, 1.0, 'o')] What would the expected output be? Under the existing intersection method, there would be two intervals output [(0.1, 0.2, "hello(e)"), (0.7, 1.0, "hello(o)")], but I think one could argue that in some cases only one interval is wanted (0, 1, "hello(e,o)")--which is more in line with your use case.

I wondered how I could accommodate these two scenarios--parameterize intersection()?

I decided that a simpler solution was to create a different method mergeLabels(). I implemented that in https://github.com/timmahrt/praatIO/pull/47 I also added some documentation to the existing intersection().

What do you think? Does mergeLabels() work for your use case?

timmahrt commented 1 year ago

Here is the method signature: https://github.com/timmahrt/praatIO/pull/47/files#diff-35a03755d23b8e11ea1a0d22db05fa23181cc9dfc8a6675bb72e8781ca4b269eR572

Here is an example usage from the tests, using the example you provided: https://github.com/timmahrt/praatIO/pull/47/files#diff-821de34f450931440c2ec4dcdea75ca2127eea10060b8674de5f20eeae4a303dR1225

timmahrt commented 1 year ago

I merged my PR and built a release. I've been sitting on a lot of code since November which I really shouldn't have done.

Reviews on the merged PR are still welcome--I can make a follow-up PR. :bow:

scottmk commented 1 year ago

Thanks for this! I think creating the new method is a great compromise and this helps my use case a lot.

I'll take a look at the PR and see if I have any comments to make.

Thanks for the quick response!