timmahrt / praatIO

A python library for working with praat, textgrids, time aligned audio transcripts, and audio files. It is primarily used for extracting features from and making manipulations on audio files given hierarchical time-aligned transcriptions (utterance > word > syllable > phone, etc).
MIT License
299 stars 32 forks source link

minimumIntervalLength in tgio.save not not working as expected #12

Closed timsainb closed 5 years ago

timsainb commented 5 years ago

I am saving textgrids of syllables of birdsong and noticed the tg.save function does not work as expected.

The save function

def save(self, fn, minimumIntervalLength=MIN_INTERVAL_LENGTH):

    for tier in self.tierDict.values():
        tier.sort()

    # Fill in the blank spaces for interval tiers
    for name in self.tierNameList:
        tier = self.tierDict[name]
        if isinstance(tier, IntervalTier):

            tier = _fillInBlanks(tier,
                                 "",
                                 self.minTimestamp,
                                 self.maxTimestamp)
            if minimumIntervalLength is not None:
                tier = _removeUltrashortIntervals(tier,
                                                  minimumIntervalLength)
            self.tierDict[name] = tier

    for tier in self.tierDict.values():
        tier.sort()

    # Header
    outputTxt = ""
    outputTxt += 'File type = "ooTextFile short"\n'
    outputTxt += 'Object class = "TextGrid"\n\n'
    outputTxt += "%s\n%s\n" % (repr(self.minTimestamp),
                               repr(self.maxTimestamp))
    outputTxt += "<exists>\n%d\n" % len(self.tierNameList)

    for tierName in self.tierNameList:
        outputTxt += self.tierDict[tierName].getAsText()

    with io.open(fn, "w", encoding="utf-8") as fd:
        fd.write(outputTxt)

takes as input to minimumIntervalLength by default a hardcoded parameter MIN_INTERVAL_LENGTH = 0.00000001.

When segments are longer than that number, it would be expected that they would be left as is. However, with my song, when I do not set that flag, the first syllable in each of my textgrids is changed from: Interval(start=0.339, end=0.387, label='syll') to: Interval(start=0.0, end=0.387, label='syll') Despite the segment length being longer than MIN_INTERVAL_LENGTH. Setting minimumIntervalLength to None fixes the problem in my case, but it looks like something is not working as intended in this function.

Thanks for an excellent toolset! Tim

timmahrt commented 5 years ago

Hello Tim, I'm sorry for the bug. I have fixed the problem and added tests. If you update to 3.7.1, _removeUltrashortIntervals should work correctly with your data.

Of course, if short intervals are not a problem for you, you can also leave it at null and bypass the check.

I found that the function _removeUltrashortIntervals ignores the minTimestamp of a Textgrid. If the Textgrid's minTimestamp was 0, the code should have worked ok, but if the Textgrid's minTimestamp was non-zero, the behaviour would be as you described.

If you find any other bugs or would like to see any other features, please let me know. Thanks! Tim

timsainb commented 5 years ago

Thanks!