When training I see progress followed by degradation. This is (likely) because the model is over fitting due to the limited corpus size of 8k samples. What is happening is we are overwriting the pre-trained weights in the fine-tuning task. What we would like to do is freeze the original layers. We need to figure out how to do this.
When training I see progress followed by degradation. This is (likely) because the model is over fitting due to the limited corpus size of 8k samples. What is happening is we are overwriting the pre-trained weights in the fine-tuning task. What we would like to do is freeze the original layers. We need to figure out how to do this.