Open wlsrick opened 1 year ago
Hello, I try to train the model with 4x2080Ti server, I use the command below, bash tools/dist_train.sh ./configs/recognition/vit/vitclip_large_k400.py 4 --test-last --validate --cfg-options work_dir=./work_dirs but it runs the error :
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
========= tools/train.py FAILED
How can I solve it? Thanks a lot~
Hi, could you please post the complete log here?
Hello, I try to train the model with 4x2080Ti server, I use the command below, bash tools/dist_train.sh ./configs/recognition/vit/vitclip_large_k400.py 4 --test-last --validate --cfg-options work_dir=./work_dirs but it runs the error :
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
========= tools/train.py FAILED
How can I solve it? Thanks a lot~