Open bwdGitHub opened 2 years ago
We would like to use these issues to gauge user interest.
The GPT-2 implementation does not include dropout layers. This would be useful for further pre-training and fine-tuning workflows to prevent overfitting.
We would like to use these issues to gauge user interest.
The GPT-2 implementation does not include dropout layers. This would be useful for further pre-training and fine-tuning workflows to prevent overfitting.