pjreddie / darknet

Convolutional Neural Networks
http://pjreddie.com/darknet/
Other
25.71k stars 21.32k forks source link

Issues in batchnorm_layer.c #1723

Open jony5017 opened 5 years ago

jony5017 commented 5 years ago

I was learning source code of Darknet, and I found some issues in batchnorm_layer.c:

1, in backward_batchnorm_layer(), according to the formulations in Batch norm paper, variance_delta_cpu() should be before mean_delta_cpu(), because dl/d_mean depends on dl/d_var.

2, mean_delta_cpu(), which computes dl/d_mean, I modified it like this: void mean_delta_cpu(float delta, float variance, int batch, int filters, int spatial, float mean_delta, float variance_delta, float x, float mean) { int i,j,k; for(i = 0; i < filters; ++i){ mean_delta[i] = 0; float sum = 0; for (j = 0; j < batch; ++j) { for (k = 0; k < spatial; ++k) { int index = jfiltersspatial + ispatial + k; mean_delta[i] += delta[index]; sum += x[index] - mean[i]; } } sum = (-2.) / (spatialbatch); mean_delta[i] = (-1./sqrt(variance[i] + .00001f)); mean_delta[i] += variance_delta[i] sum; } } Because mean delta is also related to variance, I corrected it based on my understanding and tested it on Cifar using my own model, however, the result is almost the same, no improvement. Anyway, epsilon is small as well, but you add it here, so I think variance_delta[i] sum should not be omitted.

By the way, in blas.c, normalize_cpu(), according to formulation in Batch norm paper, (sqrt(variance[f]) + .000001f); should be (sqrt(variance[f] + .000001f));

ujsyehao commented 5 years ago

@jony5017 I find the problem too.

(sqrt(variance[f]) + .000001f)

->

(sqrt(variance[f] + .000001f));

1274085042 commented 2 years ago

@jony5017 sum = (-2.) / (spatialbatch); -> sum *= (-2.) / (spatialbatch);