Closed Viictte closed 1 year ago
On a glance, the code looks OK to me. If you could post a minimal reproducible example then that'd be helpful.
Thank you for replying. I have solved this problem by using two optimizer with different learning rate since the learning rate for training Bert usually small like 3e-5 which might not be effective for training the CRF layer.
Hi! I tried to add a CRF module in Blenderbot for predicting the responding strategies for each sentence. More specifically, I extracted the wanted information from the encoder of Blenderbot and projected it to batch_size sentence_len num_tags tensor. However, I found the CRF layer as well as the projection linear layer is not updated during training.
Could you please help me to check where did I make mistakes?
'''
10 10 8 (batch_size turns num_strats)
'''
The ppl_value and crf_loss will be returned to the train function. loss = crf_loss + ppl_loss and then be backwarded.