voidful / TextRL

Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)
MIT License
539 stars 60 forks source link

Reward policy agent environment is not training with Finetuned model #23

Closed harshs21 closed 1 year ago

harshs21 commented 1 year ago

Loading my Google Flan T5 finetuned model on Question Answering from my hugging face account

import torch from peft import PeftModel, PeftConfig from transformers import AutoModelForCausalLM, AutoTokenizer, AutoModelForSeq2SeqLM

peft_model_id = "harshs21/google-flan-t5-base" config = PeftConfig.from_pretrained(peft_model_id) model1 = AutoModelWithLMHead.from_pretrained(config.base_model_name_or_path) # load_in_8bit=True, tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

Load the Lora model

model1 = PeftModel.from_pretrained(model1, peft_model_id)

But while loading the agent with the following code env = MyRLEnv(model1, tokenizer, observation_input=observaton_list) #, output_list = output_list actor = TextRLActor(env, model1, tokenizer)
agent = actor.agent_ppo(update_interval=5, minibatch_size=256, epochs=20)

It is giving me the following error,

/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning: should_run_async will not call transform_cell automatically in the future. Please pass the result to transformed_cell argument and any exception that happen during thetransform in preprocessing_exc_tuple in IPython 7.17 and above. and should_run_async(code) ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ in <cell line: 2>:2 │ │ │ │ /usr/local/lib/python3.10/dist-packages/textrl/actor.py:62 in init │ │ │ │ 59 │ │ elif 'encoder' in parents: # t5 │ │ 60 │ │ │ transformers_model = model.encoder │ │ 61 │ │ else: │ │ ❱ 62 │ │ │ raise ValueError('model not supported') │ │ 63 │ │ │ │ 64 │ │ if unfreeze_layer_from_past > 0: │ │ 65 │ │ │ self.middle_model = HFModelListModule(list(transformers_model.children()) │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ ValueError: model not supported

The above same case is happening when I am loading my finetuned eleutherai/pythia-1.3B model from my hugging face profile. Please someone tell me how to make finetuned model train with RLHF policy

voidful commented 1 year ago

It appears that your problem is coming from the fact that the RL model does not recognize your fine-tuned models as supported architectures.

This exception is raised when the code checks for the model's architecture and doesn't find it in the pre-defined set of supported models.

https://github.com/voidful/TextRL/blob/3412399f8464a160ca5611fa526ebc207d6eedfe/textrl/actor.py#L59

In order to make it work, you would have to extend the TextRLActor class to support the specific architectures of your models.