Open Chhokra opened 4 years ago
I was also looking into this and came to the conclusion that they likely just used 4 different random initializations as was done in (Chollampatt, Ng, 2018), a paper they reference.
You can look at Table 1 and its footnote to see their gains from ensembling and that just different random initializations is what they use. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/viewFile/17308/16137
Note I am not affiliated with either paper's authors, and this is just speculation though.
Thanks @kevbp5. We did use 4 different random initializations for the models without DA. For the models with DA, we also used pre-trained checkpoints from different pre-training stages.
The readme only mentions of a training one single model if I'm not wrong. How to go about training 4 models as mentioned by the results table of your paper?