Optimize confidence cutoffs

erikrose commented 4 years ago

In addition to optimizing coefficients, have fathom-train pick an optimal value for the confidence cutoff. For example, if we get more training samples correct by requiring >70% than >80%, pick 70%, emit it to the user, and judge our accuracies based on it. (Then use that same threshold for the validation set, of course.) Currently, the trainer assumes a hard-coded 50% cutoff. Our binary cross-entropy loss function largely pushes the samples to the far end of the confidence space, so we haven't bothered being hypervigilant about this in the past, but we might be able to eek out another percent or two by doing this one simple thing.

erikrose commented 4 years ago

http://mlwiki.org/index.php/Precision_and_Recall might help; we might consider choosing a point on the precision-recall curve Euclidean-wise nearest (1, 1).

gleonard-m commented 3 years ago

I've completed two experiments. I used the relay training data and also tried the fathom-form-autofill data. I observed that changing the cut off really had no impact unless if was set to >= 90%.

The attached files show cutoff, precision, recall, and precision/recall euclidean distance to (1,1)

For the relay training data the accuracy dropped at cutoff = 0.95 For the autofill training data the accuracy dropped at cutoff = 0.90

The issue described setting 70% would give better accuracy that using 80% however what was observed was not as sensitive.

I'm wondering if this feature is actually worth adding?

BTW this is an awesome issue to get exposed to more of the details of fathom.

fathom-autofill-samples-cutoff.txt fathom-relay-samples-cutoff.txt

erikrose commented 3 years ago

It would be interesting to see what kind of pr_dist you get out of new-password (which we did ship with a cutoff of .75). I doubt I did anything as rigorous as a Euclidean distance minimization to come up with that number. If it makes no difference there either, perhaps it's not worth doing this after all. In any case, it seems like .50 is right for most rulesets, so, if we do go forward with this implementation, let's print something only if the optimum isn't 50%.

erikrose commented 3 years ago

Thanks for your thorough work on this! :-D

erikrose commented 3 years ago

On re-reading http://mlwiki.org/index.php/Precision_and_Recall, I see there are 2 sections: one for an information retrieval system and another for classification. Fathom does the latter, so I think we should actually optimize for accuracy rather than PR-curve Euclidean distance. This jibes with the rest of the training algorithm, which optimizes for accuracy as well. (It would be weird to have the 2 optimizations working at cross purposes; I'm not sure if the 2 working in concert would approach an optimum.) I apologize for sending you down the wrong path with my comments above. Next time, let's spend more time sketching out a solution together or else reviewing a "mockup" write-up before coding anything. I apologize again. The silver lining is that an accuracy optimization should be simpler, with no thresholds to choose or convergences to detect.

What do you think? Am I making sense?

Independently of the optimization method, I think an automatic confidence optimization can obsolete the -t flag on fathom train. We'll still need it on fathom test, of course.

gleonard-m commented 3 years ago

Definitely makes sense to use a single optimization. It is possible that the 2 optimizations can compete.

I'll make the required changes, including the -t flag on fathom train.

erikrose commented 3 years ago

Thanks, Glenda. Sorry again for sending you down a blind alley!

mozilla / fathom

Optimize confidence cutoffs #178