Closed umeannthtome closed 7 years ago
I tried to implement a form of Online Hard Example Mining (OHEM) by giving higher weight to the difficult samples.
However, I couldn't get significant accuracy improvement with that - so I had to abandon my efforts in using weighted loss. I have read in papers that OHEM is useful - not sure why it didn't work for me - but may be I didn't spend careful attention and time in implementing it.
Due to this reason, I didn't take the effort to port it from caffe-0.15 to caffe-0.16.
However, if you port it (or implement any other form of OHEM) and do get an accuracy improvement - then please do file a PR and I'll be happy to try it out and merge it.
See the additional parameters added here (in bold): You can specify these parameters in your loss layer. Choose one of the following - All of them together may not work.
For example setting bootstrap_samples_fraction: 0.05 Will cause the training to use only the hardest (most difficult) 5% of the samples to back propagate. The difficulty of a sample is assesed by the softmax probability corresponding to the ground truth label (Note: If the softmax probability corresponding to the ground truth label is greater than 0.5, then it's correctly classified).
Alternatively bootstrap_prob_threshold set to avoid back propagation for the samples that have the softmax probability corresponding to the ground truth label greater than the this threshold - i.e. it ignores the easy samples in back propagation. (Note: If the softmax probability corresponding to the ground truth label is greater than 0.5, then it's correctly classified). bootstrap_prob_threshold: 0.6
You can use assign_label_weights and num_label_weights to give more weight to the rare samples in the dataset. assign_label_weights: true num_label_weights: 5 num_label_weights represents the number of labels in your dataset. For example I think pascal dataset has 21 labels.
I can help you further in your experimentation - but please do read some papers that uses OHEM and check if my implementation is correct or not.
Highlighted these fields in LossParameter below:
// Message that stores parameters shared by loss layers message LossParameter { // If specified, ignore instances with the given label. optional int32 ignore_label = 1; // How to normalize the loss for loss layers that aggregate across batches, // spatial dimensions, or other dimensions. Currently only implemented in // SoftmaxWithLoss layer. enum NormalizationMode { // Divide by the number of examples in the batch times spatial dimensions. // Outputs that receive the ignore label will NOT be ignored in computing // the normalization factor. FULL = 0; // Divide by the total number of output locations that do not take the // ignore_label. If ignore_label is not set, this behaves like FULL. VALID = 1; // Divide by the batch size. BATCH_SIZE = 2; // Do not normalize the loss. NONE = 3; } optional NormalizationMode normalization = 3 [default = VALID]; // Deprecated. Ignored if normalization is specified. If normalization // is not specified, then setting this to false will be equivalent to // normalization = BATCH_SIZE to be consistent with previous behavior. optional bool normalize = 2;
// Bootstrap samples fraction for OHEM.
// Only so many fraction of output labels will be backpropagated
optional float bootstrap_samples_fraction = 4 [default = 0];
optional float bootstrap_prob_threshold = 5 [default = 0];
optional bool assign_label_weights = 6 [default = false];
optional int32 num_label_weights = 7 [default = 0];
}
In case of 3, how do I assign certain weights to different labels? (specifically for semantic segmentation)
Sorry, I made a mistake. Item 3 that I described above is not implemented. What is implemented is item 4. (below).
Please refer to softmax_weighted_loss_layer.cpp and softmax_weighted_loss_layer.cu in the branch caffe-0.15. Snippet is copied below:
template
int num_labels = bottom[0]->shape(softmaxaxis); if(this->layerparam.loss_param().has_num_label_weights()) { num_labels = std::min(num_labels, this->layerparam.loss_param().num_label_weights()); } //LOG(INFO) << "num_labels = " << num_labels;
vector
Dtype label_weights_data_cpu = label_weightsblob.mutable_cpu_data(); for (int i = 0; i < num_labels; ++i) { label_weights_data_cpu[i] = label_weights_data_cpu[i] 0.99 + label_weights_cur[i] * 0.01; if((itercount % 1000) == 0) { LOG(INFO) << " label_weights [" << i << "] = " << label_weights_cur[i] << ", "<< label_weights_data_cpu[i]; } }
Dtype bottom_diff = bottom[0]->mutable_cpu_diff();
const Dtype label_data = bottom[1]->cpudata();
int dim = prob.count() / outernum;
for (int i = 0; i < outernum; ++i) {
for (int j = 0; j < innernum; j++) {
Dtype selected_weight;
const int label_value = static_cast
Manu,
OK, thank you for the enlightenment.
William
Hi,
I notice that there is softmax_weighted_loss_layer in caffe-jacinto-0.15 but not in 0.16. Is it merged into softmax_loss_layer or is it removed totally?
If it is merged, how can I specify the weight in caffe-jacinto-0.16? If it is removed, how can I use the weighted loss in caffe-jacinto-0.15?
Thanks.
William