tgxs002 / CORA

A DETR-style framework for open-vocabulary detection (OVD). CVPR 2023
Apache License 2.0
166 stars 14 forks source link

About the settings of the target label #3

Closed davidyang180 closed 1 year ago

davidyang180 commented 1 year ago

This is a great work! When I was debugging on a custom dataset, I found a piece of code as follows:

for target in targets:
    target['ori_labels'] = target['labels']
    target['labels'] = target['labels'] - target['labels']

Here, all the labels of the target are set to 0. What is the significance of this setting? Because the classification loss will be calculated according to the label of the target later, will this lead to inaccurate classification during training.

tgxs002 commented 1 year ago

Thank you for your interest in our work! In the localizer training stage of CORA, we are indeed training a class-agnostic localizer, the output is objectness rather than class-wise logit. The modified label is not used for classification training. Please note that CORA is trained in two stages: classification is trained in the first stage of region prompting, and localization is trained in the second stage.

davidyang180 commented 1 year ago

Thank you for your interest in our work! In the localizer training stage of CORA, we are indeed training a class-agnostic localizer, the output is objectness rather than class-wise logit. The modified label is not used for classification training. Please note that CORA is trained in two stages: classification is trained in the first stage of region prompting, and localization is trained in the second stage.

Thanks for your reply, which solved some of my doubts. In the first stage, the classification and pre-matching of candidate proposals are trained. In the second stage, the pre-matched candidate proposals are further refined localization and judged whether they are expected targets according to the bounding box. During the debugging process, I found that the query is mapped to a one-dimensional judgment score. There is an immature question whether it is more intuitive to map the query into a two-dimensional judgment score (whether it is an object), which seems to be more suitable for the loss calculation of DETR. Such as cardinality_error etc. Finally, when will the classification training settings of the first stage be released? It seems that there are only training configurations for the second stage of localization.

tgxs002 commented 1 year ago

For one-dimensional logit, sigmoid function is applied to compute the probability; for two-dimensional logit, softmax function can achieve similar results. The code for region prompting will be released on another branch in two weeks.