Open 1064783536 opened 1 year ago
The correct command is ./tools/dist_train.sh configs/pointpillars/pointpillars_dv_secfpn_8xb6-200e_kitti-3d-3class.py 2
not GPUS=2
The correct command is
./tools/dist_train.sh configs/pointpillars/pointpillars_dv_secfpn_8xb6-200e_kitti-3d-3class.py 2
notGPUS=2
Thank you very much for your reply. I run the command "./tools/dist_train.sh configs/pointpillars/pointpillars_dv_secfpn_8xb6-200e_kitti-3d-3class.py 2", I get a new error as follow:
=================================================================================================
/home/aolei/anaconda3/envs/mmdetection3d_1/lib/python3.8/site-packages/torch/distributed/launch.py:180: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects --local_rank
argument to be set, please
change it to read from os.environ['LOCAL_RANK']
instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
warnings.warn( WARNING:torch.distributed.run:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
Prerequisite
Task
I'm using the official example scripts/configs for the officially supported tasks/models/datasets.
Branch
main branch https://github.com/open-mmlab/mmdetection3d
Environment
Package Version Source
mmcv 2.0.0 https://github.com/open-mmlab/mmcv mmdet 3.0.0 https://github.com/open-mmlab/mmdetection mmdet3d 1.1.1 /home/aolei/10_1_mmdetection3d/mmdetection3d mmengine 0.7.3 https://github.com/open-mmlab/mmengine
Reproduces the problem - code sample
When I run the command "./tools/dist_train.sh configs/pointpillars/pointpillars_dv_secfpn_8xb6-200e_kitti-3d-3class.py GPUS=2", I get some error "ValueError: Unsupported nproc_per_node value: GPUS=2".
Reproduces the problem - command or script
./tools/dist_train.sh configs/pointpillars/pointpillars_dv_secfpn_8xb6-200e_kitti-3d-3class.py GPUS=2
Reproduces the problem - error message
/home/aolei/anaconda3/envs/mmdetection3d_1/lib/python3.8/site-packages/torch/distributed/launch.py:180: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects
--local_rank
argument to be set, please change it to read fromos.environ['LOCAL_RANK']
instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructionswarnings.warn( Traceback (most recent call last): File "/home/aolei/anaconda3/envs/mmdetection3d_1/lib/python3.8/site-packages/torch/distributed/run.py", line 607, in determine_local_world_size return int(nproc_per_node) ValueError: invalid literal for int() with base 10: 'GPUS=2'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/aolei/anaconda3/envs/mmdetection3d_1/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/aolei/anaconda3/envs/mmdetection3d_1/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/aolei/anaconda3/envs/mmdetection3d_1/lib/python3.8/site-packages/torch/distributed/launch.py", line 195, in
main()
File "/home/aolei/anaconda3/envs/mmdetection3d_1/lib/python3.8/site-packages/torch/distributed/launch.py", line 191, in main
launch(args)
File "/home/aolei/anaconda3/envs/mmdetection3d_1/lib/python3.8/site-packages/torch/distributed/launch.py", line 176, in launch
run(args)
File "/home/aolei/anaconda3/envs/mmdetection3d_1/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run
config, cmd, cmd_args = config_from_args(args)
File "/home/aolei/anaconda3/envs/mmdetection3d_1/lib/python3.8/site-packages/torch/distributed/run.py", line 660, in config_from_args
nproc_per_node = determine_local_world_size(args.nproc_per_node)
File "/home/aolei/anaconda3/envs/mmdetection3d_1/lib/python3.8/site-packages/torch/distributed/run.py", line 625, in determine_local_world_size
raise ValueError(f"Unsupported nproc_per_node value: {nproc_per_node}")
ValueError: Unsupported nproc_per_node value: GPUS=2
Additional information
No response