nickgkan / butd_detr

Code for the ECCV22 paper "Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds"
Other
74 stars 11 forks source link

About the Label Map of object classes. #49

Closed RunsenXu closed 5 months ago

RunsenXu commented 5 months ago

Dear Authors,

Thanks for your great work. I guess there are four types of class mappings used in the whole system and I wonder what are the mappings of them, i.e. the class id <-> class name mapping.

  1. The first is the predicted label used in gt box + predicted label setting. I guess they are from https://github.com/nickgkan/butd_detr/blob/main/data/cls_results.json

  2. The GT Box and GT labels.

  3. The Detected Box and Detected Labels from Group-Free Detectors.

  4. Your model does not predict class labels themselvies. Am I correct?

I am new to your codebase and feel confused. Look forward to your answers.

Best,

ayushjain1144 commented 5 months ago

Hi,

All these use mappings from here: https://github.com/nickgkan/butd_detr/blob/main/data/model_util_scannet.py#L27

Ground truths from ScanNet use nyu40ids, which are mapped to our mappings above using: https://github.com/nickgkan/butd_detr/blob/main/data/model_util_scannet.py#L35

"Your model does not predict class labels themselvies. Am I correct?" : correct, the input to our model is sentences, and we predict a distribution over the sentence tokens instead of predicting an object class that closed vocabulary detectors do

RunsenXu commented 5 months ago

Hi,

Thank you so much for your quick reply. For https://github.com/nickgkan/butd_detr/blob/main/data/model_util_scannet.py#L35, I do not understand why does nyu40ids have so many class ids. What is the underlying rule here?

Best,

ayushjain1144 commented 5 months ago

that's actually the 'ids' from here: https://github.com/nickgkan/butd_detr/blob/main/data/meta_data/scannetv2-labels.combined.tsv (and not really nyu40ids)

RunsenXu commented 5 months ago

I see. Thank you for your reply!