Closed hnjzbss closed 1 year ago
Hi @hnjzbss,
Thank you for showing interest in our work.
I believe the error is due to the use of multiple GPUs for a single training experiment. For MaPLe architecture, we use nn.ParameterList()
to store the prompt vectors but using this module with multiple GPUs has a conflict with nn.DataParallel
and it is a known issue in PyTorch.
To resolve this issue, please refer to the following choices:
CUDA_VISIBLE_DEVICES=0 bash scripts/maple/base2new_train_maple.sh imagenet 1
, this will use GPU:0
for training. nn.ParameterList()
at this line.Please let us know in case there are any further queries. Thank you!
@muzairkhattak Thank you very much for your detailed answer. The problem has been solved. I wish you greater achievements in your future work!
Thank you @hnjzbss.
All the best!
Kind regards.
/home/user/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/container.py:527: UserWarning: nn.ParameterList is being used with DataParallel but this is not supported. This list will appear empty for the models replicated on each GPU except the original one. warnings.warn("nn.ParameterList is being used with DataParallel but this is not " /home/user/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/container.py:472: UserWarning: Setting attributes on ParameterList is not supported. warnings.warn("Setting attributes on ParameterList is not supported.") Traceback (most recent call last): File "/home/user/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/user/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/user/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply output.reraise() File "/home/user/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/_utils.py", line 434, in reraise raise exception IndexError: Caught IndexError in replica 0 on device 0. Original Traceback (most recent call last): File "/home/user/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker output = module(*input, kwargs) File "/home/user/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/home/user/cnn/work2022/MIMIC-project/multimodal-prompt-learning-main/trainers/maple.py", line 193, in forward prompts, shared_ctx, deep_compound_prompts_text, deep_compound_prompts_vision = self.prompt_learner() File "/home/user/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/home/user/cnn/work2022/MIMIC-project/multimodal-prompt-learning-main/trainers/maple.py", line 173, in forward visual_deep_prompts.append(layer(self.compound_prompts_text[index])) File "/home/user/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/container.py", line 462, in getitem idx = self._get_abs_string_index(idx) File "/home/user/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/container.py", line 445, in _get_abs_string_index raise IndexError('index {} is out of range'.format(idx)) IndexError: index 0 is out of range
Process finished with exit code 1