yiwenguo / Dynamic-Network-Surgery

Caffe implementation for dynamic network surgery.
186 stars 70 forks source link

question in Backward code #11

Closed flymark2010 closed 7 years ago

flymark2010 commented 7 years ago

Hi, thanks for your great work. I have some doubt about the Backward code:

1. template <typename Dtype>
2. void CConvolutionLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
3.       const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
4.  const Dtype* weightTmp = this->weight_tmp_.cpu_data();  
5.  const Dtype* weightMask = this->blobs_[2]->cpu_data();
6.  Dtype* weight_diff = this->blobs_[0]->mutable_cpu_diff();
7.   for (int i = 0; i < top.size(); ++i) {
8.     const Dtype* top_diff = top[i]->cpu_diff();    
9.     // Bias gradient, if necessary.
10.     if (this->bias_term_ && this->param_propagate_down_[1]) {
11.             const Dtype* biasMask = this->blobs_[3]->cpu_data();
12.       Dtype* bias_diff = this->blobs_[1]->mutable_cpu_diff();           
13.             for (unsigned int k = 0;k < this->blobs_[1]->count(); ++k) {
14.                 bias_diff[k] = bias_diff[k]*biasMask[k];
15.             }
16.       for (int n = 0; n < this->num_; ++n) {
17.         this->backward_cpu_bias(bias_diff, top_diff + top[i]->offset(n));
18.       }
19.     }
20.     if (this->param_propagate_down_[0] || propagate_down[i]) {
21.             const Dtype* bottom_data = bottom[i]->cpu_data();
22.             Dtype* bottom_diff = bottom[i]->mutable_cpu_diff(); 
23.             for (unsigned int k = 0;k < this->blobs_[0]->count(); ++k) {
24.                 weight_diff[k] = weight_diff[k]*weightMask[k];
25.             }
26.       for (int n = 0; n < this->num_; ++n) {
27.         // gradient w.r.t. weight. Note that we will accumulate diffs.
28.         if (this->param_propagate_down_[0]) {
29.           this->weight_cpu_gemm(bottom_data + bottom[i]->offset(n),
30.               top_diff + top[i]->offset(n), weight_diff);
31.         }
32.         // gradient w.r.t. bottom data, if necessary.
33.         if (propagate_down[i]) {
34.           this->backward_cpu_gemm(top_diff + top[i]->offset(n), weightTmp,
35.               bottom_diff + bottom[i]->offset(n));
36.         }
37.       }
38.     }
39.   }
40. }

To my understanding of caffe, the diff of weight blob is always set to 0 before each iteration. That's to say, weights_diff[k] and bias_diff[k] are always 0 before the backward_cpu_bias and weight_cpu_gemm. So operations of line 14 & line 24 are redundant. What do you really want to do? Does it should be weightTmp instead of weight_diff on line 24?

Thanks very much!

yiwenguo commented 7 years ago

You are right, @flymark2010 . That might be some testing code that I forgot to comment. Better comment those lines for higher efficiency.

flymark2010 commented 7 years ago

@yiwenguo Ok. Thanks !

kai-xie commented 7 years ago

@yiwenguo I also have a question about this part. According to your paper, weight_diff[k] and bias_diff[k] are supposed to be updated according to weightMask[k] and biasMask[k] . So is it right to move line 13 - 15 after line 18, and move line 23 - 25 after line 31? Or just remove line 13 - 15 and 23 - 25? Thank you very much!

dongxiao92 commented 6 years ago

@kai-xie I think if we use mask computed diffs(just as moving codes as you asked),weights and biases masked wil never be alive.So we pass errors to udpate those masked parameters to see if they can come alive although it's not correct in mathematics.