mathmanu / caffe-jacinto

This repository has moved. The new link can be obtained from https://github.com/TexasInstruments/jacinto-ai-devkit
116 stars 35 forks source link

Softmax with weighted loss #10

Closed umeannthtome closed 7 years ago

umeannthtome commented 7 years ago

Hi,

I notice that there is softmax_weighted_loss_layer in caffe-jacinto-0.15 but not in 0.16. Is it merged into softmax_loss_layer or is it removed totally?

If it is merged, how can I specify the weight in caffe-jacinto-0.16? If it is removed, how can I use the weighted loss in caffe-jacinto-0.15?

Thanks.

William

mathmanu commented 7 years ago

I tried to implement a form of Online Hard Example Mining (OHEM) by giving higher weight to the difficult samples.

However, I couldn't get significant accuracy improvement with that - so I had to abandon my efforts in using weighted loss. I have read in papers that OHEM is useful - not sure why it didn't work for me - but may be I didn't spend careful attention and time in implementing it.

Due to this reason, I didn't take the effort to port it from caffe-0.15 to caffe-0.16.

However, if you port it (or implement any other form of OHEM) and do get an accuracy improvement - then please do file a PR and I'll be happy to try it out and merge it.

See the additional parameters added here (in bold): You can specify these parameters in your loss layer. Choose one of the following - All of them together may not work.

  1. For example setting bootstrap_samples_fraction: 0.05 Will cause the training to use only the hardest (most difficult) 5% of the samples to back propagate. The difficulty of a sample is assesed by the softmax probability corresponding to the ground truth label (Note: If the softmax probability corresponding to the ground truth label is greater than 0.5, then it's correctly classified).

  2. Alternatively bootstrap_prob_threshold set to avoid back propagation for the samples that have the softmax probability corresponding to the ground truth label greater than the this threshold - i.e. it ignores the easy samples in back propagation. (Note: If the softmax probability corresponding to the ground truth label is greater than 0.5, then it's correctly classified). bootstrap_prob_threshold: 0.6

  3. You can use assign_label_weights and num_label_weights to give more weight to the rare samples in the dataset. assign_label_weights: true num_label_weights: 5 num_label_weights represents the number of labels in your dataset. For example I think pascal dataset has 21 labels.

I can help you further in your experimentation - but please do read some papers that uses OHEM and check if my implementation is correct or not.

Highlighted these fields in LossParameter below:

// Message that stores parameters shared by loss layers message LossParameter { // If specified, ignore instances with the given label. optional int32 ignore_label = 1; // How to normalize the loss for loss layers that aggregate across batches, // spatial dimensions, or other dimensions. Currently only implemented in // SoftmaxWithLoss layer. enum NormalizationMode { // Divide by the number of examples in the batch times spatial dimensions. // Outputs that receive the ignore label will NOT be ignored in computing // the normalization factor. FULL = 0; // Divide by the total number of output locations that do not take the // ignore_label. If ignore_label is not set, this behaves like FULL. VALID = 1; // Divide by the batch size. BATCH_SIZE = 2; // Do not normalize the loss. NONE = 3; } optional NormalizationMode normalization = 3 [default = VALID]; // Deprecated. Ignored if normalization is specified. If normalization // is not specified, then setting this to false will be equivalent to // normalization = BATCH_SIZE to be consistent with previous behavior. optional bool normalize = 2;

// Bootstrap samples fraction for OHEM. // Only so many fraction of output labels will be backpropagated optional float bootstrap_samples_fraction = 4 [default = 0];
optional float bootstrap_prob_threshold = 5 [default = 0];
optional bool assign_label_weights = 6 [default = false]; optional int32 num_label_weights = 7 [default = 0];
}

umeannthtome commented 7 years ago

In case of 3, how do I assign certain weights to different labels? (specifically for semantic segmentation)

mathmanu commented 7 years ago

Sorry, I made a mistake. Item 3 that I described above is not implemented. What is implemented is item 4. (below).

  1. You can use assign_label_weights and num_label_weights to give more weight to those classes that are misclassified most often according to the accuracy measure. assign_label_weights: true num_label_weights: 5 num_label_weights represents the number of labels in your dataset. For example I think pascal datast has 21 labels. In this method, the weighting is implemented by trying to give more weight to those classes that have lower IoU accuracy. So I collect the IoU accuracy for each sample and keep a running average. For the classes that have lower accuracy, I give a higher weight.

Please refer to softmax_weighted_loss_layer.cpp and softmax_weighted_loss_layer.cu in the branch caffe-0.15. Snippet is copied below:

template void SoftmaxWithWeightedLossLayer::AssignLabelWeights_cpu(const vector<Blob>& bottom, const vector<Blob>& top) {

int num_labels = bottom[0]->shape(softmaxaxis); if(this->layerparam.loss_param().has_num_label_weights()) { num_labels = std::min(num_labels, this->layerparam.loss_param().num_label_weights()); } //LOG(INFO) << "num_labels = " << num_labels;

vector label_weights_cur(num_labels, Dtype(1.0)); const Dtype iou_class_data = iouclass.cpu_data(); Dtype iou_mean = ioumean.cpu_data(); for (int i = 0; i < num_labels; ++i) { if(iou_class_data[i] > 0) { Dtype weight = (iou_class_data[i] + 1.0 - iou_mean); weight = pow(weight, 4);//2 label_weights_cur[i] = std::max(std::min(1.0 / weight, 10.0), 0.1); } else { label_weights_cur[i] = 1.0; } }

Dtype label_weights_data_cpu = label_weightsblob.mutable_cpu_data(); for (int i = 0; i < num_labels; ++i) { label_weights_data_cpu[i] = label_weights_data_cpu[i] 0.99 + label_weights_cur[i] * 0.01; if((itercount % 1000) == 0) { LOG(INFO) << " label_weights [" << i << "] = " << label_weights_cur[i] << ", "<< label_weights_data_cpu[i]; } }

Dtype bottom_diff = bottom[0]->mutable_cpu_diff(); const Dtype label_data = bottom[1]->cpudata(); int dim = prob.count() / outernum; for (int i = 0; i < outernum; ++i) { for (int j = 0; j < innernum; j++) { Dtype selected_weight; const int label_value = static_cast(label_data[i innernum + j]); if (has_ignorelabel && label_value == ignorelabel) { selected_weight = 0; } else if(label_value < num_labels){ selected_weight = label_weights_data_cpu[label_value]; } else { selected_weight = 0.0; } for(int l = 0; l<num_labels; l++) { bottom_diff[i dim + l innernum + j] = selected_weight; } } } }

umeannthtome commented 7 years ago

Manu,

OK, thank you for the enlightenment.

William