Open Toooodd opened 5 years ago
I guess the author did that purposely. since when evaluating the accuracy of an clustering problem, we just take the maximum value of output mean as the network's answer, and ignore output variance information. so the author did not sampling the weight and make the network just output the mean value.
you are right, Yang. But I still think that when we predict, we should consider the disturbance term of W and compute the results multiple times and then average. This is more in line with the original intent of the article.
yeah but in my opinion, the mean of network output is decided by the μ
parameter in that network, and the variance of output will be decided by σ
parameter which σ=log(1+exp(ρ))
. We can simply get the average by disable weight sampling and only use the μ
to predict once.
If you take μ + σ・ε
to make predict multiple times and then average, I think the average will finally converge to the μ
. so it maybe nonsense to do like that, I think...
yeah, you are absolutely right, and I also recognize your opinion. But what I want to talk about is that it may be more in line with the original intent of the article, and represent the advantages of this method when predict the unseen data and plot it. haha, nice to meet u, yang! you are so active, :)
nice to meet u too. :) and, I think the problem is, if you want to take the advantages of the σ・ε
in an clustering problem, you need to make a method to evaluate accuracy which consider about the variance. for example, if the network output a best answer with large variance and a better answer with small variance, take the better as the finally answer.
that's great solution. and I suddenly realized you are right from practical and academic perspective.
def evaluate_accuracy(data_iterator, net, layer_params): numerator = 0. denominator = 0. for i, (data, label) in enumerate(data_iterator): data = data.as_in_context(ctx).reshape((-1, 784)) label = label.as_in_context(ctx) output = net(data, layer_params) predictions = nd.argmax(output, axis=1) numerator += nd.sum(predictions == label) denominator += data.shape[0] return (numerator / denominator).asscalar()
I think that layer_params should not be a fixed value when u predict the model, it would change every time u predict