nickgkan / butd_detr

Code for the ECCV22 paper "Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds"
Other
74 stars 11 forks source link

Concerning parsing prediction #40

Closed Daniellli closed 11 months ago

Daniellli commented 11 months ago

Hi, thank you for your wonderful work.

I am trying to parse the prediction, and i feel confusing at the following lines. https://github.com/nickgkan/butd_detr/blob/10570e0b6826d4a236b18c2c8fac5903866e1c60/train_dist_mod.py#L227-L232

Why replacing the last_sem_cls_scores to the new generated tokenidx and wordidx based class info? what do the tokenidx and wordidx mean?

thank you for your attention

Daniellli commented 11 months ago

moreover, does this line right?

dividing the probability of being object, the top class token with highest response will be changed, maybe become the second largest one? https://github.com/nickgkan/butd_detr/blob/10570e0b6826d4a236b18c2c8fac5903866e1c60/models/ap_helper.py#L150C20-L150C20

ayushjain1144 commented 11 months ago

Why replacing the last_sem_cls_scores to the new generated tokenidx and wordidx based class info? : The idea is to convert the logits over the detection prompt span (i.e. 256 dimensions) to logits over class labels (18 classes and thus 19 dimensions to include no object class).

what do the tokenidx and wordidx mean?: See #17

moreover, does this line right?: This line ends up being inconsequential because here we multiply back sem_cls_scores with obj_prob. We will fix this in the next version.

Daniellli commented 11 months ago

Hi,

does the line 153 correct? if it is correct, the boxes with high IoU but diffferent prediction class will be maintained, which i think a little unreasonable if the IoU is extremely high, such as 1?

Daniellli commented 11 months ago

moreover, the line 155 make me confusing, why the [last - 1] are always excluded? is there any probability about the [last -1] box is the correctest one?

ayushjain1144 commented 11 months ago

nms.py comes unchanged from votenet, I believe they would be able to better assist you. We didn't really look deep into the nms code.

Daniellli commented 11 months ago

hi , i am little confusing with following lines: https://github.com/nickgkan/butd_detr/blob/10570e0b6826d4a236b18c2c8fac5903866e1c60/src/grounding_evaluator.py#L197-L201

why the GT involved in parsing prediciton ?