philschmid / llm-sagemaker-sample

Apache License 2.0
37 stars 17 forks source link

Issue when continuing fine-tuning #22

Open MikeMpapa opened 4 days ago

MikeMpapa commented 4 days ago

Hi and thanks for the great resources.

I used "train-deploy-llama3.ipynb" and trained a similar Llama3 model as shown in the notebook. I pushed my model on hugging face and now I want to use that new model as starting point and fine-tune itl further with new data.

However when I try to do that the process crushes with the following error:

File "/opt/ml/code/run_fsdp_qlora.py", line 220, in <module> training_function(script_args, training_args) File "/opt/ml/code/run_fsdp_qlora.py", line 161, in training_function trainer = SFTTrainer( File "/opt/conda/lib/python3.10/site-packages/trl/trainer/sft_trainer.py", line 228, in __init__ model = get_peft_model(model, peft_config) File "/opt/conda/lib/python3.10/site-packages/peft/mapping.py", line 136, in get_peft_model return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config, adapter_name=adapter_name) File "/opt/conda/lib/python3.10/site-packages/peft/peft_model.py", line 1094, in __init__ super().__init__(model, peft_config, adapter_name) File "/opt/conda/lib/python3.10/site-packages/peft/peft_model.py", line 129, in __init__ self.base_model = cls(model, {adapter_name: peft_config}, adapter_name) File "/opt/conda/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 136, in __init__ super().__init__(model, config, adapter_name) File "/opt/conda/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 148, in __init__ self.inject_adapter(self.model, adapter_name) File "/opt/conda/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 325, in inject_adapter self._create_and_replace(peft_config, adapter_name, target, target_name, parent, current_key=key) File "/opt/conda/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 220, in _create_and_replace new_module = self._create_new_module(lora_config, adapter_name, target, **kwargs) File "/opt/conda/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 301, in _create_new_module raise ValueError( ValueError : Target module Dropout(p=0.05, inplace=False) is not supported. Currently, only the following modules are supported:torch.nn.Linear,torch.nn.Embedding,torch.nn.Conv2d,transformers.pytorch_utils.Conv1D. Traceback (most recent call last):

I am using the exact same code that I used for training the original Llama3 with the only change that I now in the yaml file instead of referencing model_id="meta-llama/Meta-Llama-3-70b" I reference model_id="MyFinetuned_Llama3-70b"

Any thoughts would be very much appreciated!!

philschmid commented 4 days ago

Can you share teh repository on huggingface? It looks like that you only saved the adapters and not merged it back into the model which needs to be done before you can use it as base