Closed scottmk closed 1 year ago
Alternatively, it could behave as it currently does, and instead you could add the entryList
entry index from each tier to the intersection
entries.
I think you first solution sounds reasonable. I'll take a look and see what I can do.
When I went to implement the changes, I realized that my original intention with intersection()
wouldn't be compatible with the changes you suggested.
For example, what would happen for a phone list, where only some of the phones are listed for each word, e.g.
[(0, 1, "hello")] and [(0.1, 0.2, 'e'), (0.7, 1.0, 'o')]
What would the expected output be? Under the existing intersection method, there would be two intervals output [(0.1, 0.2, "hello(e)"), (0.7, 1.0, "hello(o)")]
, but I think one could argue that in some cases only one interval is wanted (0, 1, "hello(e,o)")
--which is more in line with your use case.
I wondered how I could accommodate these two scenarios--parameterize intersection()
?
I decided that a simpler solution was to create a different method mergeLabels()
. I implemented that in https://github.com/timmahrt/praatIO/pull/47 I also added some documentation to the existing intersection()
.
What do you think? Does mergeLabels()
work for your use case?
Here is the method signature: https://github.com/timmahrt/praatIO/pull/47/files#diff-35a03755d23b8e11ea1a0d22db05fa23181cc9dfc8a6675bb72e8781ca4b269eR572
Here is an example usage from the tests, using the example you provided: https://github.com/timmahrt/praatIO/pull/47/files#diff-821de34f450931440c2ec4dcdea75ca2127eea10060b8674de5f20eeae4a303dR1225
I merged my PR and built a release. I've been sitting on a lot of code since November which I really shouldn't have done.
Reviews on the merged PR are still welcome--I can make a follow-up PR. :bow:
Thanks for this! I think creating the new method is a great compromise and this helps my use case a lot.
I'll take a look at the PR and see if I have any comments to make.
Thanks for the quick response!
Today I encountered an issue with the behavior of
intersection
.Say I have a
WORD
tier that looks like this:And I have a
PHONE
tier that looks like this:Assuming these are time-aligned correctly, when I call
intersection
, I get a list that looks something like this:Because I have two intervals in the
WORD
tier which have the same label, from this intersection I can't really tell if I have two distinct words"A"
that have the respective transcriptions"AH0"
and"EY1"
, or if I have one distinct word"A"
transcribed as"AH0 EY1"
.Obviously, there is no right way to solve this, but I would suggest that since we do know that the word entries are distinct, that perhaps instead the label should be the
WORD
label plus a tuple of all thePHONE
labels that coincide with it. Something like this:This would also mean that the interval boundaries would be the boundaries of the left-hand side tier. So my example would be for
If you instead did
you would get
What do you think?