/usr/local/miniconda3/envs/pcdt/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torch.distributed.run.
Note that --use_env is set by default in torch.distributed.run.
If your script expects --local_rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
warnings.warn(
WARNING:torch.distributed.run:*****
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
Feels like something wrong with distributed, any ideas? Thanks
I could train with a single GPU, but when I try to run with multiple GPU by running dist_train.sh, the program stopped without reporting anything.
My dist_train.sh is like this:
CUDA_VISIBLE_DEVICES=0,1 nohup python3 -m torch.distributed.launch --nproc_per_node=2 --master_port 29501 train.py --launcher pytorch > log.txt&
The log.txt shows like this:
/usr/local/miniconda3/envs/pcdt/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torch.distributed.run. Note that --use_env is set by default in torch.distributed.run. If your script expects
--local_rank
argument to be set, please change it to read fromos.environ['LOCAL_RANK']
instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructionswarnings.warn( WARNING:torch.distributed.run:***** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
Feels like something wrong with distributed, any ideas? Thanks