kjappelbaum / gpt3forchem

Apache License 2.0
2 stars 2 forks source link

use test-time augmentation for uncertainty estimates #16

Open kjappelbaum opened 2 years ago

kjappelbaum commented 2 years ago

is probably also cheaper than the ensembles I've been trying to build

pschwllr commented 2 years ago

It's quite an unexplored technique but worked well for yield prediction.

You should just make sure to also train with data augmentation.

kjappelbaum commented 2 years ago

Yep, doing that and also have some ensembles.

kjappelbaum commented 2 years ago

But there is also almost no literature on gpt3 uncertainty and fine tuning ensembles/uncertainty

kjappelbaum commented 2 years ago

@pschwllr did you investigate it as a function of the number of augmentation rounds? That is, how many rounds did you need for converged uncertainty estimates?

pschwllr commented 2 years ago
image

We had tried up to 10.

https://chemrxiv.org/engage/chemrxiv/article-details/60c75258702a9b726c18c101

kjappelbaum commented 2 years ago

anecdotal evidence, but also seems to do something reasonable here

Screen Shot 2022-09-06 at 11 38 09

will quantify when I'm back in the hotel

kjappelbaum commented 2 years ago

that's a calibration curve using "1-standard deviation" as probability for the "is_correct classification task"

Screen Shot 2022-09-06 at 12 34 55
kjappelbaum commented 1 year ago

ok, on regression they do not seem very reliable. Even when using 50 rounds, the Prediction Interval Coverage Probability (PICP) is < 30%

kjappelbaum commented 1 year ago
Screen Shot 2022-09-09 at 11 51 17

and no meaningful change in predictive performance.

kjappelbaum commented 1 year ago

I'll run some systematic tests to put into the SI, but I do not think it will be very promising.

pschwllr commented 1 year ago

I'm surprised that there is basically no difference. Usually, data augmentation always helps a bit.

What data augmentation approach did you go for in the end?

pschwllr commented 1 year ago

I had a quick look at the data augmentation. If I'm not mistaken, the canonical version of the molecule is not kept in the data when the augmentation is done. This might make the task harder than it should be.

Ideally, the model would have access to the canonical_smiles and X augmented copies.

Same at test time, I would always include the canonical_smiles.