Closed Daniellli closed 11 months ago
Hi, positive_map comes from a model trained to predict spans -- not ground truth (Refer this line)
yeah, I see.
But, the input of this pretrained model includes the referred target name, does it make sense?
No, that pretrained model just takes the text utterance as input (Here)
I am interested in understanding the generation process of the predicted positive map. I noticed that the values are constrained within the range of {0, 1}. Could you kindly provide insights into the methodology used to generate this data? I would greatly appreciate a detailed explanation of the process. Thank you.
you can think of positive map as a probability distribution over the text tokens. Here is the relevant code to look at: https://github.com/nickgkan/butd_detr/blob/main/src/text_cls.py#L354-L381. For eg. if the sentence is " Basketball on sofa" and the root word is basketball, the positive map could look like [0.5, 0.5, 0.0, 0.0, ......, 0.0] assuming that the tokenizer split the basketball into two tokens "basket" and "ball" (and thus they divide among them the total weight of 1.0).
Sr3d/nr3d/scanrefer generally tell us the class of the ground truth object, and we use that and simple string matching to determine the location of root word in sentence. This is the relevant code: https://github.com/nickgkan/butd_detr/blob/main/src/text_cls.py#L304-L323. This becomes the ground truth for the span prediction model.
Hi, thank you for your details answers.
I found the value range of predicted span is {0,1}, namely, only 0 and 1 appeared, but the span prediction model is a regression model (https://github.com/nickgkan/butd_detr/blob/10570e0b6826d4a236b18c2c8fac5903866e1c60/src/text_cls.py#L394C1-L399C10). Could you share your post-process?
Moreover, this is such wonderful work, I got a lot of inspiration from it, especially the span preediction model. If possible, could you further share the pretrained span prediction model weight or the training process instruction? Both of them would be better.
thank you for your attention.
But I still have some more question
I found the value range of predicted span is {0,1}, namely, only 0 and 1 appeared, but the span prediction model is a regression model
: Most of them are 0/1 because the tokenizer didn't split the root word but I am very sure you would be able to find positive maps that have values other than 0/1.
Also, its not regression perse, the model is predicting logits and is trained with binary cross entropy with logits (that has a sigmoid internally): https://github.com/nickgkan/butd_detr/blob/10570e0b6826d4a236b18c2c8fac5903866e1c60/src/text_cls.py#L94-L96
This is the post-processing: https://github.com/nickgkan/butd_detr/blob/10570e0b6826d4a236b18c2c8fac5903866e1c60/src/text_cls.py#L113-L122; It is simply threshholding logits with 0 (since the range of model output is -inf to inf, values above 0 would have prob>0.5 if we apply sigmoid over model's output).
If possible, could you further share the pretrained span prediction model weight or the training process instruction?
: Edit: We have the instructions to train span prediction in readme but not the weights. It should be straightforward to train it (within less than half an hour of training time)
hi, sorry for bothering you again. may i ask why use the predicted span not the GT span as supervise signal during the training process?
thank you for your time
https://github.com/nickgkan/butd_detr/blob/10570e0b6826d4a236b18c2c8fac5903866e1c60/src/grounding_evaluator.py#L197-L201
why the GT involved in parsing prediciton ?
_Originally posted by @Daniellli in https://github.com/nickgkan/butd_detr/issues/40#issuecomment-1695618035_