I have found that a very small percent of the time the time reported for a bounding box is wrong even though the coordinates are correct. I think its quantization problem and also an issue with stereo files.
In line 1017 in audioTagger.py, LabelStartTime_Seconds is currently saved as:
x1 * self.specNStepMod
This instead should be:
x1 *(int(sr * self.specNStepMod)/float(sr)) # where sr is the file sampling rate. we will have to store this when reading the audio file.
furthermore, for stereo files this also needs to be divided by a factor of 2. To properly fix this we need to decide how to best handle stereo files.
I have found that a very small percent of the time the time reported for a bounding box is wrong even though the coordinates are correct. I think its quantization problem and also an issue with stereo files.
In line 1017 in audioTagger.py, LabelStartTime_Seconds is currently saved as:
x1 * self.specNStepMod
This instead should be:
x1 *(int(sr * self.specNStepMod)/float(sr)) # where sr is the file sampling rate. we will have to store this when reading the audio file.
furthermore, for stereo files this also needs to be divided by a factor of 2. To properly fix this we need to decide how to best handle stereo files.