In your paper, you mentioned that" introduce the knowledge to the pretrained language model by post-training on knowledge-augmented data." In my opinion, post-train is different from finetune. According to paper Post-training for Deep Learning ”propose an extra training step, called post-training, which only optimizes the last layer of the network.“
But in your code, you just finetune GPT2 on the commonsense knowledge.
In your paper, you mentioned that" introduce the knowledge to the pretrained language model by post-training on knowledge-augmented data." In my opinion, post-train is different from finetune. According to paper Post-training for Deep Learning ”propose an extra training step, called post-training, which only optimizes the last layer of the network.“ But in your code, you just finetune GPT2 on the commonsense knowledge.