guillemcortes / baf-dataset

Reproducibility kit for "BAF: An Audio Fingerprinting Dataset for Broadcast Monitoring" by Guillem Cortès, Álex Ciurana, Emilio Molina, Marius Miron, Owen Meyers, Joren Six and Xavier Serra.
Apache License 2.0
34 stars 3 forks source link

unique-recall #1

Closed yaanolja closed 2 years ago

yaanolja commented 2 years ago

Thanks for sharing a good dataset.

The unique-recall calculation(compute_statistics.py) is as follows. recall = len(tps_rs_unique) / (len(tps_rs_unique) + len(fns_gt_unique))

I think tps_rs_unique should replaced with tps_gt_unique.

guillemcortes commented 2 years ago

Hi, thank you for your comment.

tps_rs_unique contains the number of True Positive segments of an algorithm results. tps_gt_unique contains the number of True Positive segments annotated that collide with tps_rs_unique. The purpose of this number is to compute TP_event_ratio. Let me know if you need more clarification on this.

yaanolja commented 2 years ago

Thanks for your explanation.

Hmm... I think the denominator of recall is the number of groundtruth('unanimity' tag in cross_annotations.csv). recall = tps_gt_unique / groundtruth_num

guillemcortes commented 2 years ago

Hi again, you're right. In terms of number of matches recall is recall = len(tps_gt_unique) / (len(tps_gt_unique) + len(fns_gt_unique)). The F1 score (in terms of number of matches) then does not make a lot of sense so I'll just remove it. I'll update the code soon. Thanks again for raising this issue.

guillemcortes commented 2 years ago

Hi, marking this as compleated. Find the new code version here: https://github.com/guillemcortes/baf-dataset/commit/b406cf7dafac479a5ea2806e2910dd88761cf6cb