Closed jiamingkong closed 2 years ago
The batch size of 4096x128 tokens is suggested by https://arxiv.org/abs/1806.00187, which proves to have better performance than the small batch. This is actually empirical and works for a large dataset. You can use a smaller batch size (like 4096x2 or so) for the small dataset.
Describe Model I am using: DeltaLM
Hi, I am trying to finetune DeltaLM on a low resource text generation task. And I have tried to prepare the data as promped in the iwslt bash files. However there are two things that I am not sure about:
So is there anything I can do to improve the situation, or any finetuning tips for small datasets? Thanks!