Closed ChantalMP closed 2 years ago
Thanks for using Graphormer. Yes, we're used to use zero to represent padding nodes.
Hi, yes, my question is why or if it is necessary to have a padding token different from the input tokens, as the padded nodes are not attended.
Yes, padded nodes are not attended, therefore you can assign to arbitrary category.
Thanks
Hi,
I have a few questions about the node padding.
Firstly, is my assumption correct, that the adding of -inf values in "pad_attn_bias_unsqueeze" has the same purpose as the attention_mask in BERT, so that there will be no attention to padded nodes?
If this is correct, why do you add +1 to x in the padding functions? As the attention is restricted not to attend there anyway, there could be any values in the padded nodes, so 0 could still be just as a regular feature value.
I talk about the padding like in
which is used to pad x.