I met this error when finetuning the model, and the environment configuration was based on the official environment.
Loading checkpoint shards: 100%|██████████| 33/33 [02:49<00:00, 5.37s/it]
Loading checkpoint shards: 100%|██████████| 33/33 [02:49<00:00, 5.14s/it]
Some weights of MPLUGOwl2LlamaForCausalLM were not initialized from the model checkpoint at /data/oss_bucket_0/mplug_owl2 and are newly initialized: ['model.visual_abstractor.encoder.layers.0.crossattention.attention.q_pos_embed', 'model.visual_abstractor.encoder.layers.2.crossattention.attention.k_pos_embed', 'model.visual_abstractor.encoder.layers.0.crossattention.attention.k_pos_embed', 'model.visual_abstractor.encoder.layers.3.crossattention.attention.q_pos_embed', 'model.visual_abstractor.encoder.layers.1.crossattention.attention.q_pos_embed', 'model.visual_abstractor.encoder.layers.4.crossattention.attention.q_pos_embed', 'model.visual_abstractor.encoder.layers.5.crossattention.attention.q_pos_embed', 'model.visual_abstractor.encoder.layers.3.crossattention.attention.k_pos_embed', 'model.visual_abstractor.encoder.layers.1.crossattention.attention.k_pos_embed', 'model.visual_abstractor.encoder.layers.2.crossattention.attention.q_pos_embed', 'model.visual_abstractor.encoder.layers.5.crossattention.attention.k_pos_embed', 'model.visual_abstractor.encoder.layers.4.crossattention.attention.k_pos_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
ic| training_args.tune_visual_abstractor: True
ic| training_args.freeze_vision_model: True
ic| len(optimizer_grouped_parameters[0]['params']): 1040
len(optimizer_grouped_parameters[1]['params']): 91
Using :/usr/local/ninja as PyTorch extensions root...
Loading extension module utils...
Using :/usr/local/ninja as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
0%| | 0/1 [00:00<?, ?it/s]/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
Traceback (most recent call last):
File "/checkpoint/binary/train_package/mplug_owl2/train/train_mem.py", line 13, in <module>
train()
File "/checkpoint/binary/train_package/mplug_owl2/train/train.py", line 801, in train
trainer.train()
File "/root/.local/lib/python3.8/site-packages/transformers/trainer.py", line 1539, in train
return inner_training_loop(
File "/root/.local/lib/python3.8/site-packages/transformers/trainer.py", line 1809, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/root/.local/lib/python3.8/site-packages/transformers/trainer.py", line 2654, in training_step
loss = self.compute_loss(model, inputs)
File "/root/.local/lib/python3.8/site-packages/transformers/trainer.py", line 2679, in compute_loss
outputs = model(**inputs)
File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.local/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/root/.local/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1736, in forward
loss = self.module(*inputs, **kwargs)
File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
File "/root/.local/lib/python3.8/site-packages/peft/peft_model.py", line 922, in forward
return self.base_model(
File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
File "/checkpoint/binary/train_package/mplug_owl2/model/modeling_mplug_owl2.py", line 242, in forward
outputs = self.model(
File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
File "/checkpoint/binary/train_package/mplug_owl2/model/modeling_llama2.py", line 337, in model_forward
layer_outputs = torch.utils.checkpoint.checkpoint(
File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(*args)
File "/checkpoint/binary/train_package/mplug_owl2/model/modeling_llama2.py", line 333, in custom_forward
return module(*inputs, past_key_value, output_attentions)
File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
File "/checkpoint/binary/train_package/mplug_owl2/model/modeling_llama2.py", line 222, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
File "/checkpoint/binary/train_package/mplug_owl2/train/llama_flash_attn_monkey_patch.py", line 55, in forward
query_states, key_states = apply_rotary_pos_emb(
File "/root/.local/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 184, in apply_rotary_pos_emb
cos = cos[position_ids].unsqueeze(1) # [bs, 1, seq_len, dim]
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
Question
I met this error when finetuning the model, and the environment configuration was based on the official environment.