salesforce / CodeTF

CodeTF: One-stop Transformer Library for State-of-the-art Code LLM
Apache License 2.0
1.46k stars 101 forks source link

UnboundLocalError: local variable 'peft_config' referenced before assignment #30

Closed Paul-B98 closed 1 year ago

Paul-B98 commented 1 year ago

I tried to run the demo example for fine tuning the CodeT5+ Model in the README with the peft changed to prefixtuning

from codetf.trainer.codet5_trainer import CodeT5Seq2SeqTrainer
from codetf.data_utility.codexglue_dataset import CodeXGLUEDataset
from codetf.models import load_model_pipeline
from codetf.performance.evaluation_metric import EvaluationMetric
from codetf.data_utility.base_dataset import CustomDataset

model_class = load_model_pipeline(model_name="codet5", task="pretrained",
            model_type="plus-220M", is_eval=True)

dataset = CodeXGLUEDataset(tokenizer=model_class.get_tokenizer())
train, test, validation = dataset.load(subset="text-to-code")

train_dataset= CustomDataset(train[0], train[1])
test_dataset= CustomDataset(test[0], test[1])
val_dataset= CustomDataset(validation[0], validation[1])

evaluator = EvaluationMetric(metric="bleu", tokenizer=model_class.tokenizer)

# peft can be in ["lora", "prefixtuning"]
trainer = CodeT5Seq2SeqTrainer(train_dataset=train_dataset, 
                                validation_dataset=val_dataset, 
                                peft="prefixtuning",
                                pretrained_model_or_path=model_class.get_model(),
                                tokenizer=model_class.tokenizer)
trainer.train()

however, I got the following error:

---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
Cell In[1], line 20
     17 evaluator = EvaluationMetric(metric="bleu", tokenizer=model_class.tokenizer)
     19 # peft can be in ["lora", "prefixtuning"]
---> 20 trainer = CodeT5Seq2SeqTrainer(train_dataset=train_dataset, 
     21                                 validation_dataset=val_dataset, 
     22                                 peft="prefixtuning",
     23                                 pretrained_model_or_path=model_class.get_model(),
     24                                 tokenizer=model_class.tokenizer)
     25 trainer.train()

File [~/.conda/envs/codetf/lib/python3.8/site-packages/codetf/trainer/codet5_trainer.py:45](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/paul/projects/edu/master/mdl-ii/src/modeling/~/.conda/envs/codetf/lib/python3.8/site-packages/codetf/trainer/codet5_trainer.py:45), in CodeT5Seq2SeqTrainer.__init__(self, train_dataset, validation_dataset, tokenizer, checkpoints_path, pretrained_model_or_path, training_args, evaluator, evaluation_fn, peft)
     43     peft_config = self.get_default_lora_config_for_codet5()
     44 self.model.enable_input_require_grads()
---> 45 self.model = get_peft_model(self.model, peft_config)
     46 self.model.print_trainable_parameters()

UnboundLocalError: local variable 'peft_config' referenced before assignment

The logging and deps are teh same as in #29

Paul-B98 commented 1 year ago

https://github.com/salesforce/CodeTF/blob/b6515706fd5934f2dc0d6045978b918a6dd3a63f/codetf/trainer/codet5_trainer.py#L40-L46

If i undertand it correctly there should be a peft_config for prefixtuning. When i have the time i will try to fix this in my pr #31