jz462 / Large-Scale-VRD.pytorch

Implementation for the AAAI2019 paper "Large-scale Visual Relationship Understanding"
https://arxiv.org/abs/1804.10660
MIT License
144 stars 25 forks source link

Extracting object, subject and relationship embeddings #8

Closed digbose92 closed 4 years ago

digbose92 commented 5 years ago

Can anyone tell how to use the pretrained model under vg_VGG16 to extract embeddings for object, subject and relations in an image? While extracting the embeddings do I need to provide ground truth bounding boxes as inputs ?

jz462 commented 5 years ago

Hi @digbose92, if you want to get visual embeddings for subjects, objects and relations, you can do these steps:

  1. Return "sbj_vis_embeddings", "obj_vis_embeddings", "prd_vis_embeddings" in forward() from lib/modeling/reldn_heads.py
  2. Catch the returned embeddings at line 379 from lib/modeling/model_builder_rel.py, and return them at the end of this function
  3. The returned embeddings will be caught by im_get_det_rels() from lib/core/test_rel.py, so you also want to make sure they are probably returned and stored like every other blob.

Hope this could help you!

pulinagrawal commented 5 years ago

Thanks @jz462.

im_get_det_rels() produces an 'prd_score' which is of the dimension (600, 71). I do not understand how to interpret that. The 600 corresponds to the the 600 predicates I believe. But I don't know about the 71.

jz462 commented 5 years ago

Hi Pulin,

I believe you are running it on the VRD dataset, and in that case the 600 means the current batch size, while 71 means the 70 positive predicates plus one "None" predicate.

Thanks, Ji

On Mon, Oct 28, 2019 at 7:27 PM Pulin Agrawal notifications@github.com wrote:

Thanks @jz462 https://github.com/jz462.

im_get_det_rels() produces an 'prd_score' which is of the dimension (600, 71). I do understand how to interpret that. The 600 corresponds to the the 600 predicates I believe. But I don't know about the 71.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jz462/Large-Scale-VRD.pytorch/issues/8?email_source=notifications&email_token=AF6JWZRBWAFB3J5SVKDUHFLQQ5YO3A5CNFSM4HRNKSPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECOXG7Q#issuecomment-547189630, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF6JWZR2N3K4EUUROEIOW7DQQ5YO3ANCNFSM4HRNKSPA .

pulinagrawal commented 5 years ago

Ok. Thanks for your reply.

I am running it on one image. So, I am not sure how the network handles a batch size when I just feed one image and how to interpret 600 object and subject labels.

jz462 commented 5 years ago

The batch size means the number of box pairs, so it could be multiple. Then the model just tries to predict one predicate per instance in the batch.

Ji

On Mon, Oct 28, 2019 at 7:51 PM Pulin Agrawal notifications@github.com wrote:

Ok. Thanks for your reply.

I am running it on one image. So, I am not sure how the network handles a batch size when I just feed one image.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jz462/Large-Scale-VRD.pytorch/issues/8?email_source=notifications&email_token=AF6JWZVHA3KIAJNB2MWD64LQQ53J5A5CNFSM4HRNKSPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECOYUUQ#issuecomment-547195474, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF6JWZVDDUSCLNCXRBWHEOLQQ53J5ANCNFSM4HRNKSPA .

pulinagrawal commented 5 years ago

Oh Ok. Makes sense now. Thanks a lot.

digbose92 commented 4 years ago

@jz462 Sorry for the delayed response. Thanks for the help.

mrfarazi commented 4 years ago

@pulinagrawal and @digbose92 , did you it to work? I am trying a similar approach where a pre-trained VRD model is used to extract subject - relation - object triplets from a given image.

achireistefan commented 4 years ago

Oh Ok. Makes sense now. Thanks a lot.

Hello @pulinagrawal, I am trying to test it on one img also, got the object, subject and relationship embeddings as described above but I am stuck on interpreting them, the default gensim word2vec model from google uses tensors of len=300, but the "sbj_vis_embeddings", "obj_vis_embeddings", "prd_vis_embeddings" have len=1024. Is the enhanced word2vec model provided somewhere and I am missing it?

I am looking to use something like similar_by_vector method from the gensim model to get the string for visual representation. Any help will be appreciated, thanks a lot.