Establish performance indicators for evaluating trained models

arjun370 commented 4 years ago

We currently use mAP scores(per class+overall) to evaluate the performance of the network.

We would like to research any additional evaluation metrics to determine the performance of the trained models.

arjun370 commented 4 years ago

As per our research, we understood that the evaluation is based on 2 criteria's:- 1.)The labelling format type like'JSON' or 'Pascal VOC', 2.) The evaluation metrics used in the python script.

So according to 'JSON' format labelling, we can use 'COCO evaluation metric' criteria which gives us precision and recall values of the tested images in terms of area of bounding boxes i.e. pixel area within the bounding boxes.

On the other hand, using 'Pascal VOC' labelling format, we use 'pascal evaluation metric' and we get the mAP value of the tested images per class.

We would like to research a lit bit more if we can get precision and recall values per class.

arjun370 commented 4 years ago

There are 2 types of evaluation that can be performed based on mean Average Precision(mAP) and precision and recall value based on IOU (intersection of Union) and area of bounding boxes.

IOU (intersection over union) = (area of overlap)/(area of Union)

For more information on IOU, visit the below-mentioned link https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/

mAP- If the IOU>=0.5(this threshold is decided by the user) for the classes we are evaluating then each of that image bounding box data is classified to that class and stored. This is done for all the classes, we then divide the obtained no. of predicted classes based on IOU with actual no. of bounding box labels present in the annotation file, this is called average precision per class. Once the average precision is obtained for each class we then add all these values together and divide by total no. of classes which give us mean average precision (mAP).

Precision and Recall- In this approach the average precision method described above is used but in a slightly extensive way. Here the IOU is varied from 0.5 to 0.95 with a step size of 0.05 and all the predicted class data is stored. This data is then added up and divided by 10 (where 10 is the average over IOU levels i.e. ((0.95-0.5)/0.05 +1)), this gives us average precision of a class over all the IOU levels. Then this average precision value of per class is divided against the actual labelled annotation file to obtain precision value. The same approach is done to obtain recall value as well.

Furthermore, there is also another precision and recall method which is based on the area of bounding boxes. In this, we set the range of area of the bounding box to classify a class like small objects ranging from 〖10〗^2<area< 〖35〗^2, this helps us classify classes based on the size of the object and is another complementary identification method to evaluate how good the network is.

We would be using both the methods of mAP and Precision/Recall to test our network as this gives us a deeper understanding of our network's capabilities. Like mAP, value helps us evaluate our network's performance per class which is beneficial to understand if we need to collect more data to improve the scores or not whereas Precision/Recall helps us evaluate the smallest range up to which our network can perform effectively as even the network is having limitation after reaching a certain bounding boxes area.

tue-mps-edu / asd-engd-project-2019-thermal-object-detection

Establish performance indicators for evaluating trained models #57