Closed SleepEarlyLiveLong closed 1 year ago
For A100, I think the attribute is_model_parallel
in config.json
should be set to False
, which indicates that there is no model parallel used. We will fix this in convert_mp.py
in the later version.
For 4xA10, you can uncomment lines 36-37 in scripts/llama/eval/eval_main_dolly.sh
to use model parallel. We can add an example script in the next version.
Thanks a lot! I solved the issue.
I want to run scripts/llama/eval/eval_main_dolly.sh to evaluate sft/llama-13B, I have access to 1 A100 gpu OR 4 A10 gpus, how should I modify the scripts/llama/eval/eval_main_dolly.sh file to get it work? I tried the following order on 1 A100 gpu:
and files in checkpoints/llama/train/sft/llama-13B/ are: where pytorch_model.bin is converted from mp4/ using the released file tools/convert_mp.py however, it gives bugs as following:
How to slove the problem?