jzhang38 / EasyContext

Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
Apache License 2.0
529 stars 33 forks source link

Confused by the train scripts #17

Closed Bostoncake closed 2 months ago

Bostoncake commented 2 months ago

In train_scripts/EasyContext-1M-Llama-2-7B.sh, line 53 specifies --model PY007/Llama2-7B-64K. Why isn't it --model ./output/7B_64K_bs_1M_rope_5M_step_1000_lr_2e-5, which is the output model of the previous training process?

Also, would you upload training scripts for llama-2 13B in the future? I really appreciate this work and I am looking forward to it. Thanks!

jzhang38 commented 2 months ago

It is a typo, now fixed.

would you upload training scripts for llama-2 13B in the future?

I currently do not have enough spare compute to experiment with that. :(

Bostoncake commented 2 months ago

Thanks for your reply! I am currently working on 13B models. Hope things could work out. If so, I will create PRs and share my training solutions.

jzhang38 commented 2 months ago

@Bostoncake Thanks that would be very helpful!