Remove negative data from samples from PodCast rounds 1-10

From Zoe via Orcasound leading up to 2022 Microsoft hackathon:

I found a few conflicting data points in the training annotation data over the weekend. The file is annotations.tsv from s3://acoustic-sandbox/labeled-data/detection/train/TrainDataLatest_PodCastAllRounds_123567910.tar.gz. There are 12 data files that are labeled both positive samples (starting_time=0, duration_s>0) and negative samples (starting_time=0, duration_s=0). The negative entries probably should be removed. See the screenshots blow. I don't think it would have a big impact on the training results (it's only 12 samples out of thousands) but it would be nice to clean it up with the upcoming hackathon. Or alternatively, if people are aware, they can also remove them manually while loading the data. If there's someone working on generating/updating the labeled data this year, I can also forward those to them.

training_overlap_neg

trainning_overlap_pos

orcasound / aifororcas-podcast

Remove negative data from samples from PodCast rounds 1-10 #8