ltguo19 / VSUA-Captioning

Code for "Aligning Linguistic Words and Visual Semantic Units for Image Captioning", ACM MM 2019
MIT License
261 stars 24 forks source link

A little problem in readme #2

Closed czhxiaohuihui closed 5 years ago

czhxiaohuihui commented 5 years ago

In 3. Download image scene graph data, need download coco_pred_sg.zip , but in 4. Extract geometry relationship data, it becomes coco_img_sg, so I need download which one exactly ???

cuyuhanhan commented 5 years ago

In 3. Download image scene graph data, need download coco_pred_sg.zip , but in 4. Extract geometry relationship data, it becomes coco_img_sg, so I need download which one exactly ???

download coco_pred_sg.zip and coco_img_sg....

ltguo19 commented 5 years ago

We use coco_img_sg.zip in our code, thus you should download coco_img_sg.zip to run our code. But I found in the README of sage that it suggests using coco_pred_sg.zip, which was released later than coco_img_sg.zip. I'm not sure about the difference between these two zip files. You may ask the author of SGAE.

ltguo19 commented 5 years ago

You may also refer to this issue https://github.com/yangxuntu/SGAE/issues/9.

czhxiaohuihui commented 5 years ago

您好, 下面这段代码来自VSUAModel.py中的VSUACore类

    if 'o' in self.opt.vsua_use:
        att_obj = self.attention_obj(h_att, obj_feats, p_obj_feats, att_masks)
        lang_lstm_input = torch.cat([lang_lstm_input, att_obj], 1)

    if 'a' in self.opt.vsua_use:
        att_attr = self.attention_attr(h_att, attr_feats, p_attr_feats, att_masks)
        lang_lstm_input = torch.cat([lang_lstm_input, att_attr], 1)

    if 'r' in self.opt.vsua_use:
        att_rela = self.attention_rela(h_att, rela_feats, p_rela_feats, rela_masks)
        lang_lstm_input = torch.cat([lang_lstm_input, att_rela], 1)

这里的att_obj, att_attr, att_rela,可以看做是Scene Graph中的三种节点Object, Attribute和Relationship Embedding之后的结果嘛? 谢谢!

ltguo19 commented 5 years ago

No, att_obj, att_attr, att_rela are the attention results for the three kinds of nodes, while obj_feats, attr_feats, and rela_feats are the embeddings of the nodes.