local cross validation is apparently a much better indicator of our performance than the public leaderboard. The thought is that the external data people are using for training is actually the data in the public leaderboard hidden test set, so many groups are overfitting.
The following is a good tutorial for setting up cross-validation with scikitlearn: https://scikit-learn.org/stable/modules/cross_validation.html
local cross validation is apparently a much better indicator of our performance than the public leaderboard. The thought is that the external data people are using for training is actually the data in the public leaderboard hidden test set, so many groups are overfitting. The following is a good tutorial for setting up cross-validation with scikitlearn: https://scikit-learn.org/stable/modules/cross_validation.html