I listened the audios in TrainDataLatest_PodCastAllRounds_123567910.tar.gz (wav subfolder) and visualized the annotations and realized that they do not match. Take for instance 60012.wav and the first annotations
podcast_round1 60012.wav 34.126 2.918 Dabob Bay, Seattle, Washington 1960-10-28 60012
podcast_round1 60012.wav 36.816 2.588 Dabob Bay, Seattle, Washington 1960-10-28 60012
podcast_round1 60012.wav 42.55 2.055 Dabob Bay, Seattle, Washington 1960-10-28 60012
podcast_round1 60012.wav 44.606 2.41 Dabob Bay, Seattle, Washington 1960-10-28 60012
podcast_round1 60012.wav 46.636 3.425 Dabob Bay, Seattle, Washington 1960-10-28 60012
podcast_round1 60012.wav 51.381 3.248 Dabob Bay, Seattle, Washington 1960-10-28 60012
you will see that they onsets and offsets do not match exactly the start of the vocalizations and there are vocalizations also outside these time intervals. it looks a bit random tbh
I listened the audios in TrainDataLatest_PodCastAllRounds_123567910.tar.gz (wav subfolder) and visualized the annotations and realized that they do not match. Take for instance 60012.wav and the first annotations podcast_round1 60012.wav 34.126 2.918 Dabob Bay, Seattle, Washington 1960-10-28 60012 podcast_round1 60012.wav 36.816 2.588 Dabob Bay, Seattle, Washington 1960-10-28 60012 podcast_round1 60012.wav 42.55 2.055 Dabob Bay, Seattle, Washington 1960-10-28 60012 podcast_round1 60012.wav 44.606 2.41 Dabob Bay, Seattle, Washington 1960-10-28 60012 podcast_round1 60012.wav 46.636 3.425 Dabob Bay, Seattle, Washington 1960-10-28 60012 podcast_round1 60012.wav 51.381 3.248 Dabob Bay, Seattle, Washington 1960-10-28 60012 you will see that they onsets and offsets do not match exactly the start of the vocalizations and there are vocalizations also outside these time intervals. it looks a bit random tbh