linjieli222 / VQA_ReGAT

Research Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"
https://arxiv.org/abs/1903.12314
MIT License
180 stars 38 forks source link

Wdir(i,j) in Function 8 in the explicit model #29

Closed alice-cool closed 3 years ago

alice-cool commented 3 years ago

Dear scholar, I want to ask you whether the dimention of W dir(i,j) is dh×(dq+dv) and the bias of b lab(i,j) is one-hot vector? And I doubt about the meaning of W dir(i,j).

alice-cool commented 3 years ago

Dear scholar, May I ask you the meaning of the parameter " nongt_dim" in your code? sometimes the value is 20 or 36

alice-cool commented 3 years ago

Some words is vague. I found that in the paper, it has "no-relation" type in the spatial relation encoder. But in the code ,it didn't include the type.

alice-cool commented 3 years ago

Dear scholar, Does your code include the all of three relation inference code. I found that in your code ,you said the three relation model is trained independently. And only in the inference time, you will balance the three models' results to get the probabilities of a certain predicted answer

alice-cool commented 3 years ago

Where to encode the explicit relation type into the model. I found nothing. If the label bias is from adj_list , it could no help because you create the matrix only transpose(num_roi, num_roi ) and the label_num didn't vary and this only represents the edge' existence. But it didn't represents the diverse relation type. So I think the code is not complete about the relation label in the semantic and spacial type.????

linjieli222 commented 3 years ago

Dear scholar, I want to ask you whether the dimention of W dir(i,j) is dh×(dq+dv) and the bias of b lab(i,j) is one-hot vector? And I doubt about the meaning of W dir(i,j).

The bias b lab(i,j) is just a scalar. For W dir(i,j), we neglected multi-head attention in Eq. (8). So it might be a little confusing. W dir(i,j) should be corresponds to the linear_out layer here: https://github.com/linjieli222/VQA_ReGAT/blob/9f6fe5bcda169c268eb1c92ef00df9f61d540081/model/graph_att_layer.py#L51-L53

We followed the implementation of Relation Networks for Object Detection to implement multi-head attention for better efficiency.

Dear scholar, May I ask you the meaning of the parameter " nongt_dim" in your code? sometimes the value is 20 or 36 https://github.com/linjieli222/VQA_ReGAT/blob/9f6fe5bcda169c268eb1c92ef00df9f61d540081/main.py#L101-L102

Some words is vague. I found that in the paper, it has "no-relation" type in the spatial relation encoder. But in the code ,it didn't include the type.

In build_graph(), if the relative position between box i and box j) does not fall into any of the if and else statement, then the adj_matrix(i,j) does not receive an label, hence the "no-relation" type. Note that the "no-relation" is only for constructing the spatial graph, the spatial graph attention will not consider this type, as we don't think there is a relation with two objects that are too far away from each other.

Dear scholar, Does your code include the all of three relation inference code. I found that in your code ,you said the three relation model is trained independently. And only in the inference time, you will balance the three models' results to get the probabilities of a certain predicted answer

We did not include the code to aggregate three models results. The aggregation is very straightforward, we simply take a weighted sum of the logit from each model. I believe we used alpha=0.3 and beta= 0.3.

Where to encode the explicit relation type into the model. I found nothing. If the label bias is from adj_list , it could no help because you create the matrix only transpose(num_roi, num_roi ) and the label_num didn't vary and this only represents the edge' existence. But it didn't represents the diverse relation type. So I think the code is not complete about the relation label in the semantic and spacial type.????

As stated in the comments below, the adj_matix is of shape batch_size, num_rois, num_rois, num_labels]. Therefore, it is a one-hot embedding of the relation labels. https://github.com/linjieli222/VQA_ReGAT/blob/9f6fe5bcda169c268eb1c92ef00df9f61d540081/model/graph_att.py#L57

The bias layer operates on the last dimension of adj_matrix to learn a bias term for each relation type. https://github.com/linjieli222/VQA_ReGAT/blob/9f6fe5bcda169c268eb1c92ef00df9f61d540081/model/graph_att.py#L40 https://github.com/linjieli222/VQA_ReGAT/blob/9f6fe5bcda169c268eb1c92ef00df9f61d540081/model/graph_att.py#L90

alice-cool commented 3 years ago

Thanks all the time for your timely reply. Thank you. It helps me a lot. 

---Original--- From: "Linjie Li"<notifications@github.com> Date: Thu, Jan 21, 2021 14:26 PM To: "linjieli222/VQA_ReGAT"<VQA_ReGAT@noreply.github.com>; Cc: "Author"<author@noreply.github.com>;"Donglearner"<2524981200@qq.com>; Subject: Re: [linjieli222/VQA_ReGAT] Wdir(i,j) in Function 8 in the explicit model (#29)

Dear scholar, I want to ask you whether the dimention of W dir(i,j) is dh×(dq+dv) and the bias of b lab(i,j) is one-hot vector? And I doubt about the meaning of W dir(i,j).

The bias b lab(i,j) is just a scalar. For W dir(i,j), we neglected multi-head attention in Eq. (8). So it might be a little confusing. W dir(i,j) should be corresponds to the linear_out layer here: https://github.com/linjieli222/VQA_ReGAT/blob/9f6fe5bcda169c268eb1c92ef00df9f61d540081/model/graph_att_layer.py#L51-L53

We followed the implementation of Relation Networks for Object Detection to implement multi-head attention for better efficiency.

Dear scholar, May I ask you the meaning of the parameter " nongt_dim" in your code? sometimes the value is 20 or 36 https://github.com/linjieli222/VQA_ReGAT/blob/9f6fe5bcda169c268eb1c92ef00df9f61d540081/main.py#L101-L102

Some words is vague. I found that in the paper, it has "no-relation" type in the spatial relation encoder. But in the code ,it didn't include the type.

In build_graph(), if the relative position between box i and box j) does not fall into any of the if and else statement, then the adj_matrix(i,j) does not receive an label, hence the "no-relation" type. Note that the "no-relation" is only for constructing the spatial graph, the spatial graph attention will not consider this type, as we don't think there is a relation with two objects that are too far away from each other.

Dear scholar, Does your code include the all of three relation inference code. I found that in your code ,you said the three relation model is trained independently. And only in the inference time, you will balance the three models' results to get the probabilities of a certain predicted answer

We did not include the code to aggregate three models results. The aggregation is very straightforward, we simply take a weighted sum of the logit from each model. I believe we used alpha=0.3 and beta= 0.3.

Where to encode the explicit relation type into the model. I found nothing. If the label bias is from adj_list , it could no help because you create the matrix only transpose(num_roi, num_roi ) and the label_num didn't vary and this only represents the edge' existence. But it didn't represents the diverse relation type. So I think the code is not complete about the relation label in the semantic and spacial type.????

As stated in the comments below, the adj_matix is of shape batch_size, num_rois, num_rois, num_labels]. Therefore, it is a one-hot embedding of the relation labels. https://github.com/linjieli222/VQA_ReGAT/blob/9f6fe5bcda169c268eb1c92ef00df9f61d540081/model/graph_att.py#L57

The bias layer operates on the last dimension of adj_matrix to learn a bias term for each relation type. https://github.com/linjieli222/VQA_ReGAT/blob/9f6fe5bcda169c268eb1c92ef00df9f61d540081/model/graph_att.py#L40 https://github.com/linjieli222/VQA_ReGAT/blob/9f6fe5bcda169c268eb1c92ef00df9f61d540081/model/graph_att.py#L90

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.