Closed lostanlen closed 2 years ago
not sure what you mean with "how you come up with decisions as to which event to keep or prune?". This function diagnoses a detection and one thing it does is counting the occurrence of split/merged positives. That said, you can use filter_detection.R
to remove those ambiguous detections based on a criterion defined by the user. Split positives can make diagnose_detection.R
count more TP that their actually are.
My question is: how can i get the largest set of overlapping reference-detection pairs such that each reference is matched at most once and each detection is matched at most once?
OK, so the goal of this function is to diagnose a detection. Detections can come from ohun's own functions or from other packages/software. For optimizing detections in ohun diagnosing is already incorporated into two functions: optimize_energy_detector and optimize_template_detector. You will see that they just iterate the correspondent detection functions (optimize_energy_detector and template_detector) over different combinations of tuning parameters and then call diagnose_detection on each interation. For diagnosing external software detections the user has to run diagnose_detection for each detection (although this can be done independently over different detections using the argument 'by'). Decisions on how to optimize a detection are left to the users as this is dependent on the goals of the detections (e.g. in some cases some FPs are ok but not in others). ohun only provides a bunch of diagnostics that can inform that decision.
About this issue:
https://github.com/maRce10/ohun/blob/master/R/diagnose_detection.R#L180
I realized forcing recall <= 1 is no longer necessary as it was dealt with earlier in the code here: https://github.com/maRce10/ohun/blob/master/R/diagnose_detection.R#L171
I am not asking about how to perform detection but about evaluating (diagnosing) it.
Let me give you an example. I have an event detector (blue segments) and a reference (black segments). Depending on how i match predictions to references, i might end up with 4 TP (top) or 3 TP (center, bottom).
If i were to pass the black and blue segments to ohun, could you guarantee that diagnose_detection
will return the optimal match (4 TP) and not some suboptimal variant (3 TP or less)?
if not, it could be a problem because it could under-evaluate both the recall and the precision of a detector.
Ok I see. It will tell you that you have 4 TP because 4 reference sounds are overlapped by detections:
library(ohun)
# reference
ref <- data.frame(sound.files = "1.wav",
selec = 1:5,
start = c(1, 2, 3, 4, 5),
end = c(1.5, 2.5, 3.5, 4.5, 5.5)
)
# detection
det <- data.frame(sound.files = "1.wav",
selec = 1:4,
start = c(0.75, 1.4, 3.2, 4.25),
end = c(1.25, 3.1, 4.1, 4.8)
)
# diagnose
diagnose_detection(reference = ref, detection = det)
But it will also tell you that you have some split and merged positives
But this "4 reference sounds are overlapped by detections" is only an upper bound on TP, right?
What would happen in this other case ?
Is there a way i can get a TP=4 on the first case and a TP=1 in the latter?
in the latter you get TP = 4 as well but only one detection for that sound file and 4 merged positives. But I see your point. However I am not sure that just calling that TP = 1 is informative enough for the user. That's why I added these other metrics. Anyways, I am open to suggestions.
in the latter you get TP = 4 as well but only one detection for that sound file and 4 merged positives. But I see your point. However I am not sure that just calling that TP = 1 is informative enough for the user. That's why I added these other metrics.
The first example should unambiguously be TP=4, FP=1, FN=0. And the second example should be TP=1, FP=0, FN=3. The number of true positive should never be higher than the number of positives.
My suggestion would be to frame this as a combinatorial optimization problem: specifically, bipartite graph matching.
This problem could be solved in two steps: (1) first, building the bipartite graph by listing all matching pairs. The naïve way to do this is to consider all pairs and check for overlap. If the number of events is very large, this procedure can become slow. A faster way to do this would be sorted bisection search. This is what i did for the DCASE "few-shot bioacoustic event detection task": https://github.com/c4dm/dcase-few-shot-bioacoustic/blob/main/evaluation_metrics/metrics.py#L6
(2) then, running the Hopcroft-Karp algorithm to solve bipartite graph matching. mir_eval has an implementation of this in Python (_bipartite_match
). There might be one in R that's already available, although i'm not familiar enough with R to comment.
thanks. That seems to be useful for assigning positives to TPs right?
first, list all candidate pairs between prediction and reference. in your case, the criterion for being a candidate pair is to have nonzero overlap. one might come up with a stricter criteria, such as having at least 50% Intersection-over-Union (IoU) ratio (what we did at DCASE FSD), or putting an upper bound on the lag between predicted onset and reference onset, or between predicted offset and reference offset. All these choices are comprehensively covered by Annamaria Mesaros in her sed_eval package.
then, among those candidates, run Hopcroft-Karp to find the optimal number of matching pair. That number is TP
FP = number of predicted events - TP
FN = number of reference events - TP
Precision, Recall, F-measure, etc. follow accordingly
ok, will take a look at that. thanks!
Hello @maRce10
I am reading throught
label_detection.R
anddiagnose_detection.R
and see you have defined "split positives" and "merged positives". I'm curious how you come up with decisions as to which event to keep or prune? Is your method optimal?i'm also curious as to why you need to do this:
https://github.com/maRce10/ohun/blob/master/R/diagnose_detection.R#L180
how come you have a recall above 1? are you ever matching multiple predictions to the same reference?