paperswithcode / galai

Model API for GALACTICA
Apache License 2.0
2.67k stars 275 forks source link

Ablation: Pretrain first on OPT data _then_ on scientific texts? #45

Closed rodrigonogueira4 closed 1 year ago

rodrigonogueira4 commented 1 year ago

First of all, great work!

Did you try pretraining Galactica using the original OPT checkpoint as a starting point? Since both models have similar architectures and Galactica's dataset is "only" 110B tokens, I imagine that starting from a model that was pretrained on more data would bring some gains.

RJT1990 commented 1 year ago

Thanks, first author here. We considered this, but didn't have time, but the reasons why we down-weighted this were:

rodrigonogueira4 commented 1 year ago

Hi Ross, great, thanks for your reply!

RJT1990 commented 1 year ago

nw, happy to answer any other questions about the paper - let me know!