What is the largest possible input dataset for fine-tuning of gpt-2 1.5b or any other model (774M)?
For example, does it make sense to train the model on 300kk tokens(1.2GB of txt file, or ~500MB npz file)? If yes, then what is the decent limit for any of those models which can still improve it and make sense for fine-tuning? Or does it better to split the input dataset into several parts and then use it one by one for fine-tuning of the same checkpointed model?
What is the largest possible input dataset for fine-tuning of gpt-2 1.5b or any other model (774M)? For example, does it make sense to train the model on 300kk tokens(1.2GB of txt file, or ~500MB npz file)? If yes, then what is the decent limit for any of those models which can still improve it and make sense for fine-tuning? Or does it better to split the input dataset into several parts and then use it one by one for fine-tuning of the same checkpointed model?