Uncertainty estimation for synthetic data (3 class Gaussian distr)

debo1992 commented 4 years ago

Hi,

I'm trying to replicate the synthetic data experiment in the paper Predictive Uncertainty Estimation via Prior Networks found here- https://papers.nips.cc/paper/7936-predictive-uncertainty-estimation-via-prior-networks.pdf using your code with a few modifications. However, the precision (\alpha_0) of the output is not able to differentiate between in-distribution and out of distribution data as the values are always low i.e. <2. Could you please give me a list of parameters and initial conditions you used? Here is the list of parameters I used. Is there something wrong I'm doing? Also I am attaching the code I am using right now. DPNmodifiedFiles.zip

parser.add_argument('--alpha', type=float, default=1e2) parser.add_argument('--epochs', type=int, default=30) parser.add_argument('--log_interval', type=int, default=20) parser.add_argument('--device', type=str, default='cpu') parser.add_argument('--batch_size', type=int, default=64) parser.add_argument('--lr', type=float, default=1e-2) parser.add_argument('--weight_decay', type=float, default=1) parser.add_argument('--work_dir', type=str, default="C:\Users\dirichlet-prior-networks\dpn") parser.add_argument('--model', type=str, default='mlp') parser.add_argument('--dataset', type=str, default='synthetic') parser.add_argument('--radius', type=float, default=4.0) parser.add_argument('--sigma', type=float, default=1.0) parser.add_argument('--shuffle', action='store_false') parser.add_argument('--num_train_samples', type=int, default=int(500)) parser.add_argument('--num_test_samples', type=int, default=int(1)) parser.add_argument('--log', action='store_false') parser.add_argument('--ind-loss', type=str, default= "{'dirichlet_kldiv': 1.0}") parser.add_argument('--ood-loss', type=str, default='{"dirichlet_kldiv":1}') parser.add_argument('--ind-fraction', type=float, default=0.5) parser.add_argument('--rejection-threshold', type=float, default=1e-4)

jerinphilip commented 4 years ago

Please fork and show modified commit diff or sent me a diff (assuming you've built on top of this code). I'm not going to check your code. I never got figure (c) vs figure (f) - although I think I could've eventually with some tuning (or discovering hidden bugs).

I'm not currently actively working on this, but I hope you have some luck with the same.

debo1992 commented 4 years ago

Thanks a lot. I have uploaded my files and added a description. I have not yet added the differential entropy code as I'm still trying to figure out why won't the precision output by the DPN (<10) have a similar precision as the target distribution (100).

jerinphilip commented 4 years ago

How are you computing the "precision of the target distribution"?

debo1992 commented 4 years ago

The precision I've defined it as 100. Using your smoothing function assuming a datapoint belongs to class 2 the alphas would be (1,98,1). I'm attaching args.py, where in I have assigned default alpha as 100.

jerinphilip commented 4 years ago

I'm not possibly the expert here, I come from a programming background than stats - I coded this up to integrate with a pytorch repo I had.

The network trained for predicting precisions of not 100 well, which would be the OOD data (50% at any update step), where all the catagoricals are equiprobable from the Dirichlet produced. Drawing parallels with how this works for classification with softmax probabilities, I think you will need to set a threshold to create a decision rule to figure in domain or out of domain. Similar to binary classification where even if you train with 0-1 labels, you get a probability like 0.7, which using a decision rule of a cutoff of 0.5 you classify as a 1. Reiterating, I didn't manage to get the figures c and f which makes the in-domain vs out of domain distinction, which relates to precision again - but my hunch is it's more to do with hyperparameter tuning now than a bug in this code.
Label smoothing is to smooth the optimization surface, the predictions are at inference expected to be the actual value. Label smoothing doesn't come unless you use the loss function, which is not used during inference. (I had a hard time getting the network to converge without smoothing).

debo1992 commented 4 years ago

Thanks for the suggestions. Let me try again - tuning the hyperparameters. Perhaps the network really can't work well if the precision is defined as 100, but it didn't look like it (based on what the original authors of the paper are claiming). If I can get it to work or I get an insight into this problem I can let you know if you like. May I know what you had defined your precision to be just in case you have it recorded somewhere?

jerinphilip commented 4 years ago

http://github.com/KaosEngineer/PriorNetworks-OLD/blob/181f74c556a39a1d7aff163b49380612fb34855d/prior_networks/dirichlet/run/step_train_synth.py#L34-L35

I was going with the values I found in the repo, so try the above default. Please do let me know how the experiments turn out and if you need any help with my part of the code.

debo1992 commented 4 years ago

Thanks a lot!

On Thu, Mar 5, 2020 at 9:07 PM Jerin Philip notifications@github.com wrote:

http://github.com/KaosEngineer/PriorNetworks-OLD/blob/181f74c556a39a1d7aff163b49380612fb34855d/prior_networks/dirichlet/run/step_train_synth.py#L34-L35 https://github.com/KaosEngineer/PriorNetworks-OLD/blob/181f74c556a39a1d7aff163b49380612fb34855d/prior_networks/dirichlet/run/step_train_synth.py#L34-L35

I was going with the values I found in the repo, so try the above default. Please do let me know how the experiments turn out and if you need any help with my part of the code.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jerinphilip/dirichlet-prior-networks/issues/1?email_source=notifications&email_token=AHZKWQSBJX4GUAOXFILODTDRF52WJA5CNFSM4K7O3W2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEN4SJDQ#issuecomment-595141774, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHZKWQTO77BCJV2H64ADCITRF52WJANCNFSM4K7O3W2A .

jerinphilip / dirichlet-prior-networks

Uncertainty estimation for synthetic data (3 class Gaussian distr) #1