ofirpress / attention_with_linear_biases

Code for the ALiBi method for transformer language models (ICLR 2022)
MIT License
506 stars 39 forks source link

Have you initialized the model with other model checkpoints during training? #14

Closed Victoriaheiheihei closed 1 year ago

Victoriaheiheihei commented 1 year ago

hello, impressing work. I'm confused that have you initialized the model with other model checkpoints in the wikitext-103 experiment reported in the paper?

ofirpress commented 1 year ago

I don't quite understand what you mean? All the models in the paper were trained from scratch...