Closed Hyaloid closed 1 year ago
Yes, cpm is an experimental project to extend VPipe to 3D parallel. You can refer to
https://github.com/hku-systems/vpipe/blob/main/cpm/cpm/large_8/mp_conf.json
Not all configuration files are well tested.
Yes, cpm is an experimental project to extend VPipe to 3D parallel. You can refer to
https://github.com/hku-systems/vpipe/blob/main/cpm/cpm/large_8/mp_conf.json
Not all configuration files are well tested.
@SimonZsx Thanks a lot! And what does mp_size
represent?
Hi, it’s about model/tensor parallel dimensions. You can refer to the Megatron-LM paper. 在 2023年3月30日,17:14,SeaMount @.***> 写道:
Yes, cpm is an experimental project to extend VPipe to 3D parallel. You can refer to https://github.com/hku-systems/vpipe/blob/main/cpm/cpm/large_8/mp_conf.json Not all configuration files are well tested.
Thanks a lot! And what does mp_size represent?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you modified the open/close state.Message ID: @.***>
The issue is fixed by updating Readme. So I will close it.
Hi, I'm trying to repreducing the
cpm
module, when I executepython -m launch --nnodes 1 --node_rank 3 --nproc_per_node 4 main_with_runtime.py --data_dir /usr/vpipe/cpm/data/miniimagenet/train --master_addr 172.20.21.6 --module medium_4 --checkpoint_dir output --partition medium_4/vpipe.json --sync_mode asp --distributed_backend gloo -b 2 --lr 0.000600 --lr_policy polynomial --weight-decay 0.000000 --epochs 20 --print-freq 100 --verbose 0 --num_ranks_in_server 4 --config_path medium_4/mp_conf.json
, and I got this error:It seems that there should be 4 keys in
mp_config.json
, andmp_size
andstage_to_depth_map
should be included, but only 2 keys(module_to_stage_map
,stage_to_rank_map
) are in the orginalmp_config.json
. Should I add the keymp_size
tomp_config.json
or do something else?Any help would be so appreciated.