s-JoL / Open-Llama

The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.
https://huggingface.co/s-JoL/Open-Llama-V2
MIT License
30 stars 4 forks source link

How to do V2.0 pre-training? #31

Closed mikeda100 closed 1 year ago

mikeda100 commented 1 year ago

Hi there,

Thanks a lot for the excellent work V2.0 release.

Could you please tell me if we need to re-process all data from scratch, since the data format got changed?

What are the scripts that we should run sequentially?

That's to say, what data preparation steps(scripts) shall we run before executing the following command?

accelerate launch --config_file configs/default_config.yaml train_lm.py --config configs/pretrain_config.yaml

Thanks again!

s-JoL commented 1 year ago

Thank you for your interest in this project. Switching from v1 to v2 does not require any data reprocessing. To maintain usability, there are no changes to the data format. You only need to run the command you wrote before.