Questions about previous work implementation

ykwon0407 / UQ_BNN

Uncertainty quantification using Bayesian neural networks in classification (MIDL 2018, CSDA)

135 stars 21 forks source link

Questions about previous work implementation #10

Closed doobidoob closed 4 years ago

doobidoob commented 4 years ago

Hi @ykwon0407 , Thank you for your interesting work!

I am working on your paper and Kendall's paper as a reference. I was wondering how your suggestion and Kendall's suggestion were implemented, so I looked at the code. I have several questions about the implementation of Kendall's work.

In Kendall's paper, they proposed loss function for estimating uncertainty in classification task like below:

I found creating logit vector x in SampleNormal, but I'm curious about the implementation of loss function like below: https://github.com/ykwon0407/UQ_BNN/blob/4148245e297c3001148d8fffc3133c58174a46aa/ischemic/models.py#L101

Can you explain K.log(1.+K.exp(linear_predictor_f))) ? And you used log_variance and exponential function when estimating variance. Is there any reason of that? In addition, Is there any inference code for kendall's work?

Thank you in advance!

ykwon0407 commented 4 years ago

Hello @doobidoob !

Can you explain K.log(1.+K.exp(linear_predictor_f))) ? -> Under the assumption that this problem is binary classification, I used yf(x) - log (1+ f(x)), which is actually equivalent to the Kendall's one.

And you used log_variance and exponential function when estimating variance. Is there any reason of that? -> Can you give me more specifics? I am happy to know what you mean.

In addition, Is there any inference code for kendall's work? -> Unfortunately, I do not maintain this code now and I do not have it. But I believe you can simply write by passing inputs multiple times. And also, after this work, I figured that the kendall's one can be good if you properly regularize weights in your neural networks.

Hope it helps.

doobidoob commented 4 years ago

@ykwon0407

Thank you for your reply!

Do you mean yf(x)-log(1_f(x)) is same as binary cross entropy like below? Honestly, I don't understand why that kind of expression comes out... It's annoying, but I would be very grateful if you could explain or provide a reference link. :) In addition, Is there any way to use this loss (yf(x)-log(1_f(x))) for multi-class classification using softmax (for example, # of class is 5)?
I mean you used log of variance like below, not variance itself. https://github.com/ykwon0407/UQ_BNN/blob/4148245e297c3001148d8fffc3133c58174a46aa/ischemic/models.py#L55 But I understood your intention. It is probably because of the stability of training.
Thank you!

ykwon0407 commented 4 years ago

@doobidoob

Yes, I used the one you found. if t_1=1 = y and s_1 = exp(f(x))/(1+exp(f(x))). Then, you will get CE = -log (s_1) = -f(x) + log(1+ exp(f(x))) = -(yf(x) - log(1+ exp(f(x)))). Similarly, if t_1 = 0 = y, then CE = -log (1-s_1) = -0 + log(1+ exp(f(x))) = -(yf(x) - log(1+ exp(f(x)))). In the case of multi-class classification, I believe you better to use the original form suggested by Kendall's paper.
Ah! that one is to make sure the standard deviation (equivalently variance) is positive.

doobidoob commented 4 years ago

@ykwon0407

Thanks for your explanation and kindness! I understand it, Thank you very much 👍