Closed digbose92 closed 4 years ago
Hi @digbose92, if you want to get visual embeddings for subjects, objects and relations, you can do these steps:
Hope this could help you!
Thanks @jz462.
im_get_det_rels() produces an 'prd_score' which is of the dimension (600, 71). I do not understand how to interpret that. The 600 corresponds to the the 600 predicates I believe. But I don't know about the 71.
Hi Pulin,
I believe you are running it on the VRD dataset, and in that case the 600 means the current batch size, while 71 means the 70 positive predicates plus one "None" predicate.
Thanks, Ji
On Mon, Oct 28, 2019 at 7:27 PM Pulin Agrawal notifications@github.com wrote:
Thanks @jz462 https://github.com/jz462.
im_get_det_rels() produces an 'prd_score' which is of the dimension (600, 71). I do understand how to interpret that. The 600 corresponds to the the 600 predicates I believe. But I don't know about the 71.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jz462/Large-Scale-VRD.pytorch/issues/8?email_source=notifications&email_token=AF6JWZRBWAFB3J5SVKDUHFLQQ5YO3A5CNFSM4HRNKSPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECOXG7Q#issuecomment-547189630, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF6JWZR2N3K4EUUROEIOW7DQQ5YO3ANCNFSM4HRNKSPA .
Ok. Thanks for your reply.
I am running it on one image. So, I am not sure how the network handles a batch size when I just feed one image and how to interpret 600 object and subject labels.
The batch size means the number of box pairs, so it could be multiple. Then the model just tries to predict one predicate per instance in the batch.
Ji
On Mon, Oct 28, 2019 at 7:51 PM Pulin Agrawal notifications@github.com wrote:
Ok. Thanks for your reply.
I am running it on one image. So, I am not sure how the network handles a batch size when I just feed one image.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jz462/Large-Scale-VRD.pytorch/issues/8?email_source=notifications&email_token=AF6JWZVHA3KIAJNB2MWD64LQQ53J5A5CNFSM4HRNKSPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECOYUUQ#issuecomment-547195474, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF6JWZVDDUSCLNCXRBWHEOLQQ53J5ANCNFSM4HRNKSPA .
Oh Ok. Makes sense now. Thanks a lot.
@jz462 Sorry for the delayed response. Thanks for the help.
@pulinagrawal and @digbose92 , did you it to work? I am trying a similar approach where a pre-trained VRD model is used to extract subject - relation - object triplets from a given image.
Oh Ok. Makes sense now. Thanks a lot.
Hello @pulinagrawal, I am trying to test it on one img also, got the object, subject and relationship embeddings as described above but I am stuck on interpreting them, the default gensim word2vec model from google uses tensors of len=300, but the "sbj_vis_embeddings", "obj_vis_embeddings", "prd_vis_embeddings" have len=1024. Is the enhanced word2vec model provided somewhere and I am missing it?
I am looking to use something like similar_by_vector method from the gensim model to get the string for visual representation. Any help will be appreciated, thanks a lot.
Can anyone tell how to use the pretrained model under vg_VGG16 to extract embeddings for object, subject and relations in an image? While extracting the embeddings do I need to provide ground truth bounding boxes as inputs ?