mbsantiago / whombat

Audio Annotation Tool for ML development
https://mbsantiago.github.io/whombat/
GNU General Public License v3.0
30 stars 4 forks source link

Inconsistent spectrogram representation #10

Open vogelbam opened 5 months ago

vogelbam commented 5 months ago

Description

Switching the selected best covering segment seems to impair the representation of calls. I'm not sure whether there's a satisfying solution to this issue, however, it might be confusing to labelers.

What I Did

I opened a file for annotation and zoomed in. While scrolling horizontally I noticed a call changing its representation which made it look as if the box wasn't fit properly. It appears to be caused by selecting a different covering segment. Here's an example including the used and selected segments:

Screenshot from 2024-03-26 10-45-07

Selected segment: 4 Start time: 1.2, End time: 1.7 Start time: 0.8999999999999999, End time: 1.4 Start time: 1.5, End time: 2

Screenshot from 2024-03-26 10-44-06

Selected segment: 5 Start time: 1.5, End time: 2 Start time: 1.2, End time: 1.7 Start time: 1.7999999999999998, End time: 2.3

mbsantiago commented 5 months ago

Hi @vogelbam,

Thanks a lot for this issue!

I fully agree that this discrepancy can be confusing for the labeler. I think there are three potential sources for the problem:

  1. There could be a bug causing a misalignment between the placement of spectrogram images and the drawing of annotations in the frontend canvas.
  2. The STFT (Short-Time Fourier Transform) windows of the two audio segments don't align correctly. We may need to improve the selection of start and end times for the segments.
  3. When applying a denoising algorithm or normalizing amplitude values, the output can vary depending on the whole audio content of the segment. As a result, even overlapping audio segments may exhibit significantly different appearances.

Unfortunately, this issue is an unintended consequence of computing the spectrogram in chunks. However, chunk processing is necessary to prevent Whombat from crashing with large recordings or becoming too slow to navigate.

I'll dive deeper into this issue to explore potential solutions, but I'm open to any ideas or suggestions you may have.

Thanks for the issue!

Best regards, Santiago