Open xnejed07 opened 6 years ago
Hi, Thanks for reading this work.
Yes, We apply Softmax on each row to normalize the learned noise matrix.
Correct Labels or 0% noise, This model learns a very pessimistic noise model (an aggressive dropout). Therefore, we find this model not doing good on correctly labeled datasets. We have developed a new model to tackle all kinds and types of label noise and is under review.
Hope this helps.
In the paper you don't say that you apply the softmax to every row. You say quite the opposite, quote: "the matrix W is unconstrained during optimization. Because the softmax layer implicitly normalizes the resulting conditional probabilities, there is no need to normalize W or force its entries to be nonnegative. This simplifies the optimization process by eliminating the normalization step described above."
You don't even say anything about the initialization of the weight matrix W. If you initialize it randomly, good luck with the convergence.
Hi, I have read your paper with method description. I found this paper really interesting and have several theoretical questions. First of all, since the noise matrix is unconstrained (in our case it has usually negative values), how do you extract normalized (0,1) values that are described in figures. Do you apply softmax on each row? Secondly, how is your model behaving when applied to the correct labels without noise?