Closed ShellingFord221 closed 5 years ago
Hello! Thank you for the interesting question.
When there are $K$ classes, the aleatoric and epistemic uncertainties are $K$ by $K$ matrices, say $A$ and $M$, respectively. Then, $j$-th diagonal element of $A$ (or $M$) means the aleatoric (or epistemic) uncertainty for the class $j$.
In this regards, the current uncertainty quantification method actually includes the uncertainty of each class. Hope this is informative
But your aleatoric and epistemic uncertainties are for a single sample (yes the uncertainty of each class is included), I think the uncertainty of the whole test data or a part of training data is useful, too. For example, print the uncertainties of each class in each batch (all the training data in this batch, not just one single sample), the higher uncertainty of some class means this class needs more data to be trained, then in next batch I manually add more samples in this class to get a better learner. Is this possible?
Is the uncertainty of the whole test data (i.e. the model's aleatoric and epistemic uncertainties) is just the sum of each sample's uncertainty? And the uncertainty of each class of the model is the sum of each sample's uncertainty of that class? Besides, this idea is somewhat like cost-sensitive loss in neural network, but I don't know whether there is a supporting theory...
Usually, the uncertainty is defined for a single sample, not for a whole dataset. If you want to generalize it, then the sum or mean can be used for the dataset. (Short answer: Yes, but needs caution.)
I deleted the previous answer to avoid confusion.
Hi, I read this part of your paper again (i.e. the last paragraph of Section 3.1). Aleatoric uncertainty captures inherent randomness of an output y∗, therefore we can infer that the mean of all test data's aleatoric uncertainty refers to the mean randomness of the output, which I think is not very good as a principle to judge whether to gather more data or not. But the epistemic uncertainty comes from the variability of weights given data, therefore the mean of all test data's aleatoric uncertainty refers to the mean variability of weights given these data, which can further be interpreted as the model's inference ability about these data. I think this is somewhat reasonable. Am I correct?
Aleatoric uncertainty captures inherent randomness of an output y∗, therefore we can infer that the mean of all test data's aleatoric uncertainty refers to the mean randomness of the output, which I think is not very good as a principle to judge whether to gather more data or not.
-> I agree!
But the epistemic uncertainty comes from the variability of weights given data, therefore the mean of all test data's aleatoric uncertainty refers to the mean variability of weights given these data, which can further be interpreted as the model's inference ability about these data. -> It seems typo. The bolded text should be epistemic, not aleatoric.
Overall, I agree with you!
Oh, you are right about the typo! My mistake! I think then here comes another question... During the training, I can get not only uncertainty about each class, but also loss about each class and error ratio about each class. Do they actually represent the same thing? Or will there be any difference if I choose any of them as the principle to slightly change the training data?
I think it depends. Uncertainty can be different from loss. (Of course, I may be wrong.)
I guess this topic is a little bit out of this GitHub repo, so could we discuss this more by email if needed?
Sure!
Hello again! These days I'm re-reading these papers about uncertainty. I found that your calculation about uncertainties is in the perspective of variance, the same as What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision. But your method gets the results for each sample in a matrix way, while Kendall and Gal get the results of numerical value. I know that the trace of your matrix is actually the uncertainty of all class for one example (like we discussed before), I wonder that is the trace of your AL and EP matrix has the same meaning with Kendall and Gal's decomposition of uncertainty? Why or why not? (In my experiment, I think they are quite different.) Thanks!
Hello @ShellingFord221 !
I wonder that is the trace of your AL and EP matrix has the same meaning with Kendall and Gal's decomposition of uncertainty? -> Not really. While our method considers 'variance' of an outcome (Y^) given a new input (X^) and a training dataset, Kendall and Gal consider 'variance' of intermediate output. So, our method will generate matrices with elements that are in [0,1]. But not in the Kendall and Gal's case. (So the values can be different).
Hi, can I rewrite your equations to get aleatoric uncertainty and epistemic uncertainty of each CLASS of the model, other than a single sample? I think it may show that the model is not good at some class or in some class, the model doesn't get enough good data.