GraphCodeBERT node vs. token level attention

I have a question regarding this section in the paper:

Node-vs. Token-level Attention Table 6 shows how frequently a special token [CLS] that is used to calculate probability of correct candidate attends to code tokens (Codes) and variables (Nodes). We see that although the number of nodes account for 5%∼20%, attentions over nodes overwhelm node/code ratio (around 10% to 32%) across all programming languages. The results indicate that data flow plays an important role in code understanding process and the model pays more attention to nodes in data flow than code tokens.

Is this analysis done during prediction or training? Also is it specific to one fine-tuning task or was it derived from pre-training alone? Is there code to reproduce this analysis? I want to access attention over nodes vs. attention over code information during prediction for the code search and refinement tasks.

microsoft / CodeBERT

GraphCodeBERT node vs. token level attention #314