loss calculation of multi-label classification.

sacmehta / EdgeNets

This repository contains the source code of our work on designing efficient CNNs for computer vision

MIT License

412 stars 82 forks source link

loss calculation of multi-label classification. #33

Closed aiwithshekhar closed 4 years ago

aiwithshekhar commented 4 years ago

1) For multi classification, while calculating loss for coco dataset why it is multiplied by number of classes (80.0)? Is it weight parameter for class imbalance? loss = criteria(output, target.float()) * 80.0

2) For calculating precision and recall should we use cumulative TP & FP as mentioned here: https://github.com/rafaelpadilla/Object-Detection-Metrics

sacmehta commented 4 years ago

80 is used to scale the loss, otherwise loss value is too small.
We are using cumulative values.

aiwithshekhar commented 4 years ago

Thanks for clarifying sachin. One more thing loss output we get is 'mean' over a batch we multiply (losses.update) it with batch size to get batch loss which is ok. But precision and recall are already calculated for a batch then why it's again multiplied (prec.update, rec.update) by batch size as shown in below code?

losses.update(float(loss), input.size(0)) prec.update(float(this_prec), input.size(0)) rec.update(float(this_rec), input.size(0))

sacmehta commented 4 years ago

There is nothing fancy here. If you don't do this, then the statistics computed for the entire dataset using batch-wise statistics may be different from sample-wise statistics (though not by huge margin).

Mean of sample-wise statistics is not necessarily equal to mean of batch-wise stats.

For example, Let us say that you have following values in an array [1, 2, 3, 4, 5, 6] and let us say that you are using a batch size of 4 (so, you have [1, 2, 3, 4] and [5, 6] as two batches, one batch is truncated). In this case, sample-wise mean is 3.5 while batch-wise mean is 4.0 (mean(mean([1, 2, 3, 4]) + mean([5, 6])).

Hope this helps

aiwithshekhar commented 4 years ago

Thanks for such a detailed reply. I didn't want to ask about object detection in this thread but didn't wanted to open new issue.

1) priors per box for SSD300 should be [4 ,6 ,6 ,6 ,4 ,4] which sums upto 8732, but in your implementation its [6 ,6 ,6 ,6 ,6 ,6] which sums to 11640.

Is this done intentionally, if yes what were the benefits?

sacmehta commented 4 years ago

I was too lazy to tune these priors, so used same across all scales.

If you tune these, then your object detection would be much faster.

sacmehta commented 4 years ago

Also, feel free to create a pull request to merge your changes

aiwithshekhar commented 4 years ago

Thanks for replying, sure thing !!

aiwithshekhar commented 4 years ago

i have raised a pull request, please check whether the code is correct. https://github.com/sacmehta/EdgeNets/pull/34#issue-416950573