Hello! Thank you for open source such excellent work, I am running maple according to your instructions “bash scripts/maple/base2new_train_maple.sh imagenet 1”emerged "IndexError: Caught IndexError in replica 0 on device 0" and "IndexError: index 0 is out of range "error, I use your code to run cocoop is no problem.Do you have any solutions?

hnjzbss commented 1 year ago

/home/user/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/container.py:527: UserWarning: nn.ParameterList is being used with DataParallel but this is not supported. This list will appear empty for the models replicated on each GPU except the original one. warnings.warn("nn.ParameterList is being used with DataParallel but this is not " /home/user/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/container.py:472: UserWarning: Setting attributes on ParameterList is not supported. warnings.warn("Setting attributes on ParameterList is not supported.") Traceback (most recent call last): File "/home/user/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/user/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/user/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply output.reraise() File "/home/user/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/_utils.py", line 434, in reraise raise exception IndexError: Caught IndexError in replica 0 on device 0. Original Traceback (most recent call last): File "/home/user/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker output = module(*input, kwargs) File "/home/user/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/home/user/cnn/work2022/MIMIC-project/multimodal-prompt-learning-main/trainers/maple.py", line 193, in forward prompts, shared_ctx, deep_compound_prompts_text, deep_compound_prompts_vision = self.prompt_learner() File "/home/user/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/home/user/cnn/work2022/MIMIC-project/multimodal-prompt-learning-main/trainers/maple.py", line 173, in forward visual_deep_prompts.append(layer(self.compound_prompts_text[index])) File "/home/user/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/container.py", line 462, in getitem idx = self._get_abs_string_index(idx) File "/home/user/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/container.py", line 445, in _get_abs_string_index raise IndexError('index {} is out of range'.format(idx)) IndexError: index 0 is out of range

Process finished with exit code 1

muzairkhattak commented 1 year ago

Hi @hnjzbss,

Thank you for showing interest in our work.

I believe the error is due to the use of multiple GPUs for a single training experiment. For MaPLe architecture, we use nn.ParameterList() to store the prompt vectors but using this module with multiple GPUs has a conflict with nn.DataParallel and it is a known issue in PyTorch.

To resolve this issue, please refer to the following choices:

We note that all MaPLe experiments requires only 1 GPU for both training and inference. So try to use only single GPU for training your models. This can be enforced by specifying the GPU id while running the bash script. E.g CUDA_VISIBLE_DEVICES=0 bash scripts/maple/base2new_train_maple.sh imagenet 1, this will use GPU:0 for training.
If it is necessary to train MaPLe using multiple GPUs, please refer to the official pytorch issue for possible alternative solutions. Specifically, you need to register the prompt initialization separately without using nn.ParameterList() at this line.

Please let us know in case there are any further queries. Thank you!

hnjzbss commented 1 year ago

@muzairkhattak Thank you very much for your detailed answer. The problem has been solved. I wish you greater achievements in your future work!

muzairkhattak commented 1 year ago

Thank you @hnjzbss.

All the best!

Kind regards.

muzairkhattak / multimodal-prompt-learning