[question] reasonable and maximum possible input dataset size for 1.5b(or any other) model?

What is the largest possible input dataset for fine-tuning of gpt-2 1.5b or any other model (774M)? For example, does it make sense to train the model on 300kk tokens(1.2GB of txt file, or ~500MB npz file)? If yes, then what is the decent limit for any of those models which can still improve it and make sense for fine-tuning? Or does it better to split the input dataset into several parts and then use it one by one for fine-tuning of the same checkpointed model?

openai / gpt-2

[question] reasonable and maximum possible input dataset size for 1.5b(or any other) model? #240