Closed yhcc closed 1 year ago
For v2, we used half the batch size, half the learning rate but double the number of steps. Other than that, the configuration is the same as v1. I believe these differences probably don't matter nearly as much as the differences in datasets.
Thanks for you replay. I have another questions on the OpenLLaMA V2 training, thanks for your patience in advance. Did you apply extra cleaning on these data splits, or just used these dataset as it is.
We didn't apply any extra cleaning. We simply combined the dataset and shuffled the examples.
Thanks for release openllama 7Bv2, I am wondering whether this example script is the script you used to train it https://github.com/young-geng/EasyLM/blob/main/examples/pretrain_llama_7b.sh