[Usage] Some weights of LlavaLlamaForCausalLM were not initialized from the model checkpoint When I try to pretrain model.

xiechengmude commented 11 months ago

Describe the issue

Issue:

Command:

Bash pretrain.sh  on my fineunted Llama2 model.

Log:

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Loading checkpoint shards: 100%|██████████| 3/3 [00:55<00:00, 18.40s/it]
Some weights of LlavaLlamaForCausalLM were not initialized from the model checkpoint at xDAN-AI/xDAN-L1-llama2-Think-0930-e35 and are newly initialized: ['model.layers.20.self_attn.rotary_emb.inv_freq', 'model.layers.24.self_attn.rotary_emb.inv_freq', 'model.layers.35.self_attn.rotary_emb.inv_freq', 'model.layers.25.self_attn.rotary_emb.inv_freq', 'model.layers.4.self_attn.rotary_emb.inv_freq', 'model.layers.0.self_attn.rotary_emb.inv_freq', 'model.layers.31.self_attn.rotary_emb.inv_freq', 'model.layers.33.self_attn.rotary_emb.inv_freq', 'model.layers.5.self_attn.rotary_emb.inv_freq', 'model.layers.18.self_attn.rotary_emb.inv_freq', 'model.layers.9.self_attn.rotary_emb.inv_freq', 'model.layers.16.self_attn.rotary_emb.inv_freq', 'model.layers.13.self_attn.rotary_emb.inv_freq', 'model.layers.34.self_attn.rotary_emb.inv_freq', 'model.layers.22.self_attn.rotary_emb.inv_freq', 'model.layers.19.self_attn.rotary_emb.inv_freq', 'model.layers.23.self_attn.rotary_emb.inv_freq', 'model.layers.12.self_attn.rotary_emb.inv_freq', 'model.layers.28.self_attn.rotary_emb.inv_freq', 'model.layers.39.self_attn.rotary_emb.inv_freq', 'model.layers.38.self_attn.rotary_emb.inv_freq', 'model.layers.37.self_attn.rotary_emb.inv_freq', 'model.layers.2.self_attn.rotary_emb.inv_freq', 'model.layers.15.self_attn.rotary_emb.inv_freq', 'model.layers.32.self_attn.rotary_emb.inv_freq', 'model.layers.1.self_attn.rotary_emb.inv_freq', 'model.layers.30.self_attn.rotary_emb.inv_freq', 'model.layers.36.self_attn.rotary_emb.inv_freq', 'model.layers.11.self_attn.rotary_emb.inv_freq', 'model.layers.10.self_attn.rotary_emb.inv_freq', 'model.layers.27.self_attn.rotary_emb.inv_freq', 'model.layers.29.self_attn.rotary_emb.inv_freq', 'model.layers.8.self_attn.rotary_emb.inv_freq', 'model.layers.3.self_attn.rotary_emb.inv_freq', 'model.layers.17.self_attn.rotary_emb.inv_freq', 'model.layers.6.self_attn.rotary_emb.inv_freq', 'model.layers.7.self_attn.rotary_emb.inv_freq', 'model.layers.14.self_attn.rotary_emb.inv_freq', 'model.layers.26.self_attn.rotary_emb.inv_freq', 'model.layers.21.self_attn.rotary_emb.inv_freq']

0%|          | 0/2181 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/workspace/LLaVA/llava/train/train_mem.py", line 13, in <module>
    train()
  File "/workspace/LLaVA/llava/train/train.py", line 930, in train
    trainer.train()
  File "/root/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train
    return inner_training_loop(
  File "/root/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/trainer.py", line 1787, in _inner_training_loop
    for step, inputs in enumerate(epoch_iterator):
  File "/root/miniconda3/envs/llava/lib/python3.10/site-packages/accelerate/data_loader.py", line 381, in __iter__
    dataloader_iter = super().__iter__()
  File "/root/miniconda3/envs/llava/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 441, in __iter__
    return self._get_iterator()
  File "/root/miniconda3/envs/llava/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 388, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/root/miniconda3/envs/llava/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1084, in __init__
    self._reset(loader, first_iter=True)
  File "/root/miniconda3/envs/llava/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1117, in _reset
    self._try_put_index()
  File "/root/miniconda3/envs/llava/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1351, in _try_put_index
    index = self._next_index()
  File "/root/miniconda3/envs/llava/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 623, in _next_index
    return next(self._sampler_iter)  # may raise StopIteration
  File "/root/miniconda3/envs/llava/lib/python3.10/site-packages/accelerate/data_loader.py", line 175, in _iter_with_no_split
    for idx, batch in enumerate(self.batch_sampler):
  File "/root/miniconda3/envs/llava/lib/python3.10/site-packages/torch/utils/data/sampler.py", line 254, in __iter__
    for idx in self.sampler:
  File "/workspace/LLaVA/llava/train/llava_trainer.py", line 126, in __iter__
    indices = get_modality_length_grouped_indices(self.lengths, self.batch_size, self.world_size, generator=self.generator)
  File "/workspace/LLaVA/llava/train/llava_trainer.py", line 59, in get_modality_length_grouped_indices
    lang_indices, lang_lengths = zip(*[(i, -l) for i, l in enumerate(lengths) if l < 0])

Screenshots: You may attach screenshots if it better explains the issue.

haotian-liu commented 11 months ago

Seems that the rotary embed parameters are not saved, which should be fine? What is the error caused StopIteration exception? The bottom part may be important.

qazimbhat1 commented 8 months ago

@haotian-liu I face a similar issue with a different model loading. Can you please explain why it should be fine even if the rotary embed parameters are not loaded from the model? I aim to use the new model to pretrain and fine tune llava v1.5. Would it still be fine to do that even if the model is unable to load the rotary embed parameters?

ZizhenWang commented 8 months ago

For the miss weights, I think it may caused by the transformer pkg version. I update it from 4.31.0 to 4.33.2 and solved.

qazimbhat1 commented 8 months ago

@ZizhenWang Thanks. This solves the issue.

ShawnAn-WHU commented 5 months ago

For the miss weights, I think it may caused by the transformer pkg version. I update it from 4.31.0 to 4.33.2 and solved.

@ZizhenWang I faced the problem like #1417, do you know how to slove it? Thanks in advance！

haotian-liu / LLaVA

[Usage] Some weights of LlavaLlamaForCausalLM were not initialized from the model checkpoint When I try to pretrain model. #650

Describe the issue