When evaluating models before release, we should add some measurements of the "expected nearby" label. Recall - is it being applied when it should be, Precision - is it being applied only when it should be, and F1 - is there a good balance of the two. This could be measured as:
Recall: If the ground truth is in the set of Expected Nearby suggestions, record Recall for that observation as 1, otherwise record it as 0.
Precision: If the ground truth is not in the set of Expected Nearby suggestions, record Precision as zero, otherwise record it as 1 divided by the total number of Expected Nearby suggestions returned.
F1: If the precision and recall are both 0, record f1 as 0. Otherwise calculate f1 as the harmonic mean of precision and recall as: f1 = (2 precision recall) / (precision + recall)
When evaluating models before release, we should add some measurements of the "expected nearby" label. Recall - is it being applied when it should be, Precision - is it being applied only when it should be, and F1 - is there a good balance of the two. This could be measured as:
Recall: If the ground truth is in the set of Expected Nearby suggestions, record Recall for that observation as 1, otherwise record it as 0.
Precision: If the ground truth is not in the set of Expected Nearby suggestions, record Precision as zero, otherwise record it as 1 divided by the total number of Expected Nearby suggestions returned.
F1: If the precision and recall are both 0, record f1 as 0. Otherwise calculate f1 as the harmonic mean of precision and recall as: f1 = (2 precision recall) / (precision + recall)