Open AvisP opened 1 month ago
Hi @AvisP Hmm, this seems to indicate that there is an indexing error, can you try to run on single training step on CPU and past the error traceback here?
@younesbelkada So I set use_cpu=True
and bf16=True
in TrainingArguments
and this is the error message I got
Gemma's activation function should be approximate GeLU and not exact GeLU.
Changing the activation function to `gelu_pytorch_tanh`.if you want to use the legacy `gelu`, edit the `model.config` to set `hidden_activation=gelu` instead of `hidden_act`. See https://github.com/huggingface/transformers/pull/29402 for more details.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00, 1.32s/it]
0%| | 0/6470 [00:00<?, ?it/s]Traceback (most recent call last):
File "C:\Projects\FineTune-Gemma2B\Gemma.py", line 86, in <module>
trainer.train()
File "C:\Projects\venvs\Lib\site-packages\trl\trainer\sft_trainer.py", line 361, in train
output = super().train(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Projects\venvs\Lib\site-packages\transformers\trainer.py", line 1859, in train
return inner_training_loop(
^^^^^^^^^^^^^^^^^^^^
File "C:\Projects\venvs\Lib\site-packages\transformers\trainer.py", line 2203, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Projects\venvs\Lib\site-packages\transformers\trainer.py", line 3138, in training_step
loss = self.compute_loss(model, inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Projects\venvs\Lib\site-packages\transformers\trainer.py", line 3161, in compute_loss
outputs = model(**inputs)
^^^^^^^^^^^^^^^
File "C:\Projects\venvs\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Projects\venvs\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Projects\venvs\Lib\site-packages\accelerate\utils\operations.py", line 817, in forward
return model_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Projects\venvs\Lib\site-packages\accelerate\utils\operations.py", line 805, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Projects\venvs\Lib\site-packages\torch\amp\autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Projects\venvs\Lib\site-packages\transformers\models\gemma\modeling_gemma.py", line 1121, in forward
outputs = self.model(
^^^^^^^^^^^
File "C:\Projects\venvs\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Projects\venvs\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Projects\venvs\Lib\site-packages\transformers\models\gemma\modeling_gemma.py", line 878, in forward
inputs_embeds = self.embed_tokens(input_ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Projects\venvs\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Projects\venvs\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Projects\venvs\Lib\site-packages\torch\nn\modules\sparse.py", line 163, in forward
return F.embedding(
^^^^^^^^^^^^
File "C:\Projects\venvs\Lib\site-packages\torch\nn\functional.py", line 2237, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: index out of range in self
0%| | 0/6470 [00:00<?, ?it/s]
Thanks @AvisP
I think what is happening is that the new token [PAD]
you are adding is out of the vocabulary (you might have received a warning saying that you added special tokens in the tokenizer). you'll either need to extend the vocab size of the model or use the native padd token of that model. For Gemma-2B, the padd token already exists: https://huggingface.co/google/gemma-2b/blob/main/tokenizer_config.json#L6 so you don't need to call tokenizer.add_special_tokens({'pad_token':'[PAD]'})
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Based on the example scripts, I wrote a small code to test out if training is working properly on a new dataset alpaca-cleaned. It works properly with the facebook opt-350m but doesn't when I am using the google/gemma-2b-it model. Any help in figuring out what might be the issue would be appreciated.
The error I am getting with the gemma-2b-it model is