ykwon0407 / UQ_BNN

Uncertainty quantification using Bayesian neural networks in classification (MIDL 2018, CSDA)
135 stars 21 forks source link

about uncertainties #7

Closed ShellingFord221 closed 5 years ago

ShellingFord221 commented 5 years ago

Hi, can I rewrite your equations to get aleatoric uncertainty and epistemic uncertainty of each CLASS of the model, other than a single sample? I think it may show that the model is not good at some class or in some class, the model doesn't get enough good data.

ykwon0407 commented 5 years ago

Hello! Thank you for the interesting question.

When there are $K$ classes, the aleatoric and epistemic uncertainties are $K$ by $K$ matrices, say $A$ and $M$, respectively. Then, $j$-th diagonal element of $A$ (or $M$) means the aleatoric (or epistemic) uncertainty for the class $j$.

In this regards, the current uncertainty quantification method actually includes the uncertainty of each class. Hope this is informative

ShellingFord221 commented 5 years ago

But your aleatoric and epistemic uncertainties are for a single sample (yes the uncertainty of each class is included), I think the uncertainty of the whole test data or a part of training data is useful, too. For example, print the uncertainties of each class in each batch (all the training data in this batch, not just one single sample), the higher uncertainty of some class means this class needs more data to be trained, then in next batch I manually add more samples in this class to get a better learner. Is this possible?

ShellingFord221 commented 5 years ago

Is the uncertainty of the whole test data (i.e. the model's aleatoric and epistemic uncertainties) is just the sum of each sample's uncertainty? And the uncertainty of each class of the model is the sum of each sample's uncertainty of that class? Besides, this idea is somewhat like cost-sensitive loss in neural network, but I don't know whether there is a supporting theory...

ykwon0407 commented 5 years ago

Usually, the uncertainty is defined for a single sample, not for a whole dataset. If you want to generalize it, then the sum or mean can be used for the dataset. (Short answer: Yes, but needs caution.)

I deleted the previous answer to avoid confusion.

ShellingFord221 commented 5 years ago

Hi, I read this part of your paper again (i.e. the last paragraph of Section 3.1). Aleatoric uncertainty captures inherent randomness of an output y∗, therefore we can infer that the mean of all test data's aleatoric uncertainty refers to the mean randomness of the output, which I think is not very good as a principle to judge whether to gather more data or not. But the epistemic uncertainty comes from the variability of weights given data, therefore the mean of all test data's aleatoric uncertainty refers to the mean variability of weights given these data, which can further be interpreted as the model's inference ability about these data. I think this is somewhat reasonable. Am I correct?

ykwon0407 commented 5 years ago

Aleatoric uncertainty captures inherent randomness of an output y∗, therefore we can infer that the mean of all test data's aleatoric uncertainty refers to the mean randomness of the output, which I think is not very good as a principle to judge whether to gather more data or not.

-> I agree!

But the epistemic uncertainty comes from the variability of weights given data, therefore the mean of all test data's aleatoric uncertainty refers to the mean variability of weights given these data, which can further be interpreted as the model's inference ability about these data. -> It seems typo. The bolded text should be epistemic, not aleatoric.

Overall, I agree with you!

ShellingFord221 commented 5 years ago

Oh, you are right about the typo! My mistake! I think then here comes another question... During the training, I can get not only uncertainty about each class, but also loss about each class and error ratio about each class. Do they actually represent the same thing? Or will there be any difference if I choose any of them as the principle to slightly change the training data?

ykwon0407 commented 5 years ago

I think it depends. Uncertainty can be different from loss. (Of course, I may be wrong.)

I guess this topic is a little bit out of this GitHub repo, so could we discuss this more by email if needed?

ShellingFord221 commented 5 years ago

Sure!

ShellingFord221 commented 4 years ago

Hello again! These days I'm re-reading these papers about uncertainty. I found that your calculation about uncertainties is in the perspective of variance, the same as What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision. But your method gets the results for each sample in a matrix way, while Kendall and Gal get the results of numerical value. I know that the trace of your matrix is actually the uncertainty of all class for one example (like we discussed before), I wonder that is the trace of your AL and EP matrix has the same meaning with Kendall and Gal's decomposition of uncertainty? Why or why not? (In my experiment, I think they are quite different.) Thanks!

ykwon0407 commented 4 years ago

Hello @ShellingFord221 !

I wonder that is the trace of your AL and EP matrix has the same meaning with Kendall and Gal's decomposition of uncertainty? -> Not really. While our method considers 'variance' of an outcome (Y^) given a new input (X^) and a training dataset, Kendall and Gal consider 'variance' of intermediate output. So, our method will generate matrices with elements that are in [0,1]. But not in the Kendall and Gal's case. (So the values can be different).