Closed ShellingFord221 closed 5 years ago
Sorry for the late reply.
1. y^ should be considered as a multinomial random variable with size n=1 and mean P(y^ | x^, \omega). And the multinomial random variable y^ can be represented as a one-hot encoded value.
In your example, [0.1, 0.6, 0.3] is something like mean P(y^ | x^, \omega) and it means
y^=1 with probability 0.1 y^=2 with probability 0.6 y^*=3 with probability 0.3
2. After T random draws, we calculate p(y^ | x^, \hat{\omega}_t), not one-hot encoded value.
Reference
Hi, in your appendix, Why is y one-hot encoded? Should y be the probabilities of all classes? (e.g., there are 3 classes, then y* may be [0.1, 0.6, 0.3])
If it is a label, how to get it by average T times samples?
Thanks