ludfgame / signate_stu22

0 stars 0 forks source link

参考文献 #2

Open t-nakatani opened 2 years ago

t-nakatani commented 2 years ago
t-nakatani commented 2 years ago

https://www.kaggle.com/competitions/commonlitreadabilityprize/discussion/257844 Pseudo-labeling:

I trained a roberta base model on the train set and used my best model to label the external data that I retrieved in the first step later on, I used the models from this notebook https://www.kaggle.com/andretugan/commonlit-two-models to do the pseudo-labeling (good improvements in CV and LB) I used standard error of each original excerpt to filter the pseudo-labeled external samples. Each external sample which had a pseudo-label score that deviated more from the original excerpt than it's standard error was removed from the external data selection Training:

first, I trained a single model just on the pseudo-labeled data I used low learning rates (7e-6 to 1e-5) depending on model size, I evaluated every 10 - 600 steps I evaluated on the whole train set the best model was saved Then:

I used the model from the previous step and trained 6 models on 6 folds of the original train set low learning rates evaluating every 10 steps I also trained a single albert-xxlarge on all of the training data without evaluation for 4 epochs (albert xxlarge was very stable for me). This single model got a public LB score of 0.459 and I believe it was very important in my winning submission I also trained some models using bootstrap sampling instead of crossvalidation Ensembling:

I got the out of fold predictions for each model that I trained I then made a new 6-fold splits of the oof-samples I used the new split to train 6 ridge regression models the final submission used 2 ridge regression ensembles of different models, a bootstrapped model and the single albert-xxlarge model ensembles and other models were aggregated using a weighted average (the weights for each ensemble were chosen by feel and public LB score)