Open xiyangyang99 opened 11 months ago
Greetings! As the current application will utilize over 30G of memory for batchsize=1, we suggest considering alternative graphics cards with greater memory capacity.
Greetings! As the current application will utilize over 30G of memory for batchsize=1, we suggest considering alternative graphics cards with greater memory capacity.
Thank you for your reply. I am using 8 * 3090Nvidia and the computer memory is 188Gb. There was no log output during the training process. The graphics card didn't respond either
on 8*RTX 3090 cant train! this is my train script : CUDA_VISIBLE_DEVICES=4,5,6,7 python3 -m torch.distributed.launch --master_port=12000 --nnodes 1 --nproc_per_node 4 train.py --config /home/quchunguang/003-large-model/SAM-Adapter-PyTorch/configs/cod-sam-vit-h.yaml --tag exp1
this is train logs /home/quchunguang/anaconda3/envs/SAM-Adapter/lib/python3.8/site-packages/torch/distributed/launch.py:180: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects
--local_rank
argument to be set, please change it to read fromos.environ['LOCAL_RANK']
instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructionswarnings.warn( WARNING:torch.distributed.run:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
/home/quchunguang/anaconda3/envs/SAM-Adapter/lib/python3.8/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details. warnings.warn( /home/quchunguang/anaconda3/envs/SAM-Adapter/lib/python3.8/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details. warnings.warn( /home/quchunguang/anaconda3/envs/SAM-Adapter/lib/python3.8/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details. warnings.warn( /home/quchunguang/anaconda3/envs/SAM-Adapter/lib/python3.8/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details. warnings.warn(
and always ........
not any next train output context .................
how can deal with this question?