google / automl

Google Brain AutoML
Apache License 2.0
6.24k stars 1.45k forks source link

what is the actual meaning of num_classes ? #492

Open Jokoe66 opened 4 years ago

Jokoe66 commented 4 years ago

The guide for fine-tuning pascal from coco is changed and the num_classes is altered from 20 to 21 as below.

 num_classes: 21
 moving_average_decay: 0
 label_id_mapping: {0: background, 1: aeroplane, 2: bicycle, 3: bird, 4: boat, 5: bottle, 6: bus, 7: car, 8: cat, 9: chair, 10: cow, 11: diningtable, 12: dog, 13: horse, 14: motorbike, 15: person, 16: pottedplant, 17: sheep, 18: sofa, 19: train, 20: tvmonitor}

It is confusing and what is the actual meaning of num_classes ? The default num_classes of COCO is 90 as set in hparameter_config.py so I guess it means the maximum category_id and it does not include the background class. It conflicts with the num_classes: 21 which contains the background class, doesn't it?

Jokoe66 commented 4 years ago

Okay, the num_classes still means the maximum category_id and this setting does not mean to include the background class. I checked the dateset/create_pascal_tfrecord.py and found the class label is the category_id (for pascal it ranges in [1, 20]). Setting num_classes to 21 merely adds an extra non-sense class, but does not influence the original 20 classes. The maximum category_id is 21 now, but there is no data of this class. Maybe this is a minor mistake. I suggest using the last setting num_classes: 20.

mingxingtan commented 4 years ago

Sorry, it should be 20, and 0 is always added for background.

mingxingtan commented 4 years ago

Actually, it should be 21 (the length of the label map, including background). Sorry, I gave a wrong information.

kahkeng commented 4 years ago

I think there might still be a case to be made that num_classes should be set to 20, but that there is a separate bug with calculating per-class AP for the last class which is causing #391 (this part I'm not so sure).

Here's my understanding from the code:

When we compute classification loss later, we convert the cls_targets above into a one-hot vector with params['num_classes'] being 20, where -1 for background will become a zero vector. So each cls_outputs channel represents one positive class (total of 20). At prediction time, we get 20-channel score predictions, after NMS, we get the top scores for boxes (for only positive classes) and class values ranging from 0-19, and then we compare against min_score_thresh to determine whether it should be considered positive or negative detection. We finally add back 1 to the classes of predicted boxes at this stage so that they are in the correct range again (from 1-20).

So I think num_classes should still be interpreted as number of positive classes (excluding background), and labels in the tfrecords should start from 1 and be no greater than num_classes.

C-SJK commented 4 years ago

I think there might still be a case to be made that num_classes should be set to 20, but that there is a separate bug with calculating per-class AP for the last class which is causing #391 (this part I'm not so sure).

Here's my understanding from the code:

  • In create_coco_tfrecord.py: category_id starts with id 1 (person) through id 90 (toothbrush).
  • In create_pascal_tfrecord.py: category_id starts with id 1 (aeroplane) through id 20 (tvmonitor). Even though pascal_label_map_dict has id 0 (background), the raw xml files don't have boxes with this label so it doesn't go into the tfrecords as boxes.
  • In dataloader.py: we call anchor_labeler.label_anchors, which calls self._target_assigner.assign, which calls _create_classification_targets. This matches targets to classes, but unmatched targets get assigned the value of _unmatched_cls_target, which defaults to zero. This is where the background zero value starts being introduced. However, back in label_anchors, we do cls_targets -= 1 which puts positive classes in range 0 to num_classes-1 (where num_classes would be 20 for Pascal) and negative class with value -1.

When we compute classification loss later, we convert the cls_targets above into a one-hot vector with params['num_classes'] being 20, where -1 for background will become a zero vector. So each cls_outputs channel represents one positive class (total of 20). At prediction time, we get 20-channel score predictions, after NMS, we get the top scores for boxes (for only positive classes) and class values ranging from 0-19, and then we compare against min_score_thresh to determine whether it should be considered positive or negative detection. We finally add back 1 to the classes of predicted boxes at this stage so that they are in the correct range again (from 1-20).

So I think num_classes should still be interpreted as number of positive classes (excluding background), and labels in the tfrecords should start from 1 and be no greater than num_classes.

@kahkeng @mingxingtan So the num_classes should be set to the number of positive classes,but i not sure the label_id_mapping should be add the background or not. Thank for your reply.

k11-cmd commented 4 years ago

@mingxingtan is this bug resolved? , what should be the number of classes, should we include the background as a class or not?

witignite commented 3 years ago

I think there might still be a case to be made that num_classes should be set to 20, but that there is a separate bug with calculating per-class AP for the last class which is causing #391 (this part I'm not so sure).

Here's my understanding from the code:

  • In create_coco_tfrecord.py: category_id starts with id 1 (person) through id 90 (toothbrush).
  • In create_pascal_tfrecord.py: category_id starts with id 1 (aeroplane) through id 20 (tvmonitor). Even though pascal_label_map_dict has id 0 (background), the raw xml files don't have boxes with this label so it doesn't go into the tfrecords as boxes.
  • In dataloader.py: we call anchor_labeler.label_anchors, which calls self._target_assigner.assign, which calls _create_classification_targets. This matches targets to classes, but unmatched targets get assigned the value of _unmatched_cls_target, which defaults to zero. This is where the background zero value starts being introduced. However, back in label_anchors, we do cls_targets -= 1 which puts positive classes in range 0 to num_classes-1 (where num_classes would be 20 for Pascal) and negative class with value -1.

When we compute classification loss later, we convert the cls_targets above into a one-hot vector with params['num_classes'] being 20, where -1 for background will become a zero vector. So each cls_outputs channel represents one positive class (total of 20). At prediction time, we get 20-channel score predictions, after NMS, we get the top scores for boxes (for only positive classes) and class values ranging from 0-19, and then we compare against min_score_thresh to determine whether it should be considered positive or negative detection. We finally add back 1 to the classes of predicted boxes at this stage so that they are in the correct range again (from 1-20).

So I think num_classes should still be interpreted as number of positive classes (excluding background), and labels in the tfrecords should start from 1 and be no greater than num_classes.

@kahkeng I also think that num_classes should be the number of positive classes only as well. If num_classes was background + num_positive_classes, things are getting weird. As you mentioned, at prediction time, we get only 20 outputs representing the scores of each positive class. These scores are then checked against the nms_configs['score_thresh'] whether they are negative (background) or not. This is different from model such as yolov3 where in that model, each output has objectness score specifically for determining whether that output is negative or positive.