[Urgent] Llama3 NOT Working in PPO Trainer

yuan-xia commented 4 months ago

Hi, I'm using fine tuned Llama3 model trained using unsloth. I noticed the model needs for_inference() to make sure there is no error for model.generate(). However, in PPO trainer, model is passed as AutoModelForCausalLMWithValueHead.from_pretrained(unsloth_model) and PPO trainer call model.generate() directly. There is the error again, therefore. Is there any way to get rid of the issue or avoid using for_inference()? Much appreciated.

model = AutoModelForCausalLMWithValueHead.from_pretrained(unsloth_model)

response_tensors = ppo_trainer.generate(query_tensors, **generation_kwargs)

Traceback (most recent call last): File "/home/jovyan/work/ppo_llama3.py", line 156, in response_tensors = ppo_trainer.generate(query_tensors, generation_kwargs) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/trl/trainer/ppo_trainer.py", line 469, in generate response = self._generate_batched( File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/trl/trainer/ppo_trainer.py", line 556, in _generate_batched generations = unwrapped_model.generate(padded_inputs, generation_kwargs) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/trl/models/modeling_value_head.py", line 204, in generate return self.pretrained_model.generate(*args, *kwargs) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/peft/peft_model.py", line 1638, in generate outputs = self.base_model.generate(args, kwargs) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/transformers/generation/utils.py", line 1989, in generate result = self._sample( File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/transformers/generation/utils.py", line 2932, in _sample outputs = self(model_inputs, return_dict=True) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/accelerate/hooks.py", line 169, in new_forward output = module._old_forward(args, kwargs) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 857, in _CausalLM_fast_forward outputs = fast_forward_inference( File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 794, in LlamaModel_fast_forward_inference seq_len = past_key_values[0][0].shape[-2] File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/transformers/cache_utils.py", line 314, in getitem raise KeyError(f"Cache only has {len(self)} layers, attempted to access layer with index {layer_idx}") KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'

danielhanchen commented 3 months ago

Hmm I think someone did mention PPO does not work - I think the generation step isn't working properly

yuan-xia commented 3 months ago

Hmm I think someone did mention PPO does not work - I think the generation step isn't working properly

Hey, Yes, the generation is stuck forever if I just load the model directly using the official's code. What I did to solve this issue is to load the merged GGUF fine tuned Llama3 8b bnb 4bit model into the AutoModelForCausalLMWithValueHead by using load_in_4bit = True, which can make the generation proceed. However, the generation output loses precision largely. Do you have any insight why the loading does not work still? FYI, I the merged model does not lose precision if I load into AutoModelForCausalLM.

YZHang2333 commented 3 months ago

ppo_config = {"mini_batch_size": 1, "batch_size": 1}
config = PPOConfig(**ppo_config)
ppo_trainer = PPOTrainer(config, model, ref_model, tokenizer)

query_txt = "This morning I went to the "
query_tensor = tokenizer.encode(query_txt, return_tensors="pt").to(model.pretrained_model.device)
query_tensor

generation_kwargs = {
    "min_length": -1,
    "top_k": 0.0,
    "top_p": 1.0,
    "do_sample": True,
    "pad_token_id": tokenizer.eos_token_id,
    "max_new_tokens": 20,
}

device_type = 'cuda'
dtype = model.config.torch_dtype

if type(dtype) is str:
    if   dtype ==  "float16": dtype = torch.float16
    elif dtype == "bfloat16": dtype = torch.bfloat16
pass

if model.generate.__name__ != "_fast_generate":
    model._unwrapped_old_generate = model.generate
    model.generate = unsloth.models.llama._wrap_fast_inference(model.generate, device_type, dtype, model)
pass

response_tensor = ppo_trainer.generate(list(query_tensor), return_prompt=False, **generation_kwargs)
response_txt = tokenizer.decode(response_tensor[0])
response_txt

I met the same problem. I replicated the example from trl and I solve it by extracting part of the code from FastLanguageModel.for_inference(model). The reason I did not explicitly call FastLanguageModel.for_inference is that it will fail after calling

model = AutoModelForCausalLMWithValueHead.from_pretrained(model)

and this is the error. it fails at calling internal_model.model

AttributeError                            Traceback (most recent call last)
Cell In[20], line 23
     20     model.generate = unsloth.models.llama._wrap_fast_inference(model.generate, device_type, dtype, model)
     21 pass
---> 23 FastLanguageModel.for_inference(model)
     25 response_tensor = ppo_trainer.generate(list(query_tensor), return_prompt=False, **generation_kwargs)
     26 response_txt = tokenizer.decode(response_tensor[0])

File [~/miniconda3/lib/python3.10/site-packages/unsloth/models/llama.py:2214](http://region-9.autodl.pro:50571/jupyter/lab/tree/autodl-tmp/project/~/miniconda3/lib/python3.10/site-packages/unsloth/models/llama.py#line=2213), in FastLlamaModel.for_inference(model)
   2212 internal_model = model
   2213 while not hasattr(internal_model, "lm_head"):
-> 2214     internal_model = internal_model.model
   2215 pass
   2216 lm_head = internal_model.lm_head.weight

File [~/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:1709](http://region-9.autodl.pro:50571/jupyter/lab/tree/autodl-tmp/project/~/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py#line=1708), in Module.__getattr__(self, name)
   1707     if name in modules:
   1708         return modules[name]
-> 1709 raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")

AttributeError: 'AutoModelForCausalLMWithValueHead' object has no attribute 'model'

New problem happens when calling ppo_trainer.step. I will get this error exactly as the problem I indicated in the another issue:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 28, 4096]], which is output 0 of MulBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

and I currently did not find way to solve this in ppo_trainer.

yuan-xia commented 3 months ago

ppo_config = {"mini_batch_size": 1, "batch_size": 1}
config = PPOConfig(**ppo_config)
ppo_trainer = PPOTrainer(config, model, ref_model, tokenizer)

query_txt = "This morning I went to the "
query_tensor = tokenizer.encode(query_txt, return_tensors="pt").to(model.pretrained_model.device)
query_tensor

generation_kwargs = {
    "min_length": -1,
    "top_k": 0.0,
    "top_p": 1.0,
    "do_sample": True,
    "pad_token_id": tokenizer.eos_token_id,
    "max_new_tokens": 20,
}

device_type = 'cuda'
dtype = model.config.torch_dtype

if type(dtype) is str:
    if   dtype ==  "float16": dtype = torch.float16
    elif dtype == "bfloat16": dtype = torch.bfloat16
pass

if model.generate.__name__ != "_fast_generate":
    model._unwrapped_old_generate = model.generate
    model.generate = unsloth.models.llama._wrap_fast_inference(model.generate, device_type, dtype, model)
pass

response_tensor = ppo_trainer.generate(list(query_tensor), return_prompt=False, **generation_kwargs)
response_txt = tokenizer.decode(response_tensor[0])
response_txt

I met the same problem. I replicated the example from trl and I solve it by extracting part of the code from FastLanguageModel.for_inference(model). The reason I did not explicitly call FastLanguageModel.for_inference is that it will fail after calling

model = AutoModelForCausalLMWithValueHead.from_pretrained(model)

and this is the error. it fails at calling internal_model.model

AttributeError                            Traceback (most recent call last)
Cell In[20], line 23
     20     model.generate = unsloth.models.llama._wrap_fast_inference(model.generate, device_type, dtype, model)
     21 pass
---> 23 FastLanguageModel.for_inference(model)
     25 response_tensor = ppo_trainer.generate(list(query_tensor), return_prompt=False, **generation_kwargs)
     26 response_txt = tokenizer.decode(response_tensor[0])

File [~/miniconda3/lib/python3.10/site-packages/unsloth/models/llama.py:2214](http://region-9.autodl.pro:50571/jupyter/lab/tree/autodl-tmp/project/~/miniconda3/lib/python3.10/site-packages/unsloth/models/llama.py#line=2213), in FastLlamaModel.for_inference(model)
   2212 internal_model = model
   2213 while not hasattr(internal_model, "lm_head"):
-> 2214     internal_model = internal_model.model
   2215 pass
   2216 lm_head = internal_model.lm_head.weight

File [~/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:1709](http://region-9.autodl.pro:50571/jupyter/lab/tree/autodl-tmp/project/~/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py#line=1708), in Module.__getattr__(self, name)
   1707     if name in modules:
   1708         return modules[name]
-> 1709 raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")

AttributeError: 'AutoModelForCausalLMWithValueHead' object has no attribute 'model'

New problem happens when calling ppo_trainer.step. I will get this error exactly as the problem I indicated in the another issue:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 28, 4096]], which is output 0 of MulBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

and I currently did not find way to solve this in ppo_trainer.

Hi, have you solved this issue now? I think you can use the merged model to load in valuehead to avoid any loading error, but it's still not training well on side. It's really annoying for researchers who wants to use RLHF... Hope it can be fixed soon.

danielhanchen commented 3 months ago

No sorry sadly I did not have time to look at PPO / RLOO or ones with generation sorry :( I'll try if I have time, but sadly this'll have to wait sorry

Ugadot commented 1 week ago

ppo_config = {"mini_batch_size": 1, "batch_size": 1}
config = PPOConfig(**ppo_config)
ppo_trainer = PPOTrainer(config, model, ref_model, tokenizer)

query_txt = "This morning I went to the "
query_tensor = tokenizer.encode(query_txt, return_tensors="pt").to(model.pretrained_model.device)
query_tensor

generation_kwargs = {
    "min_length": -1,
    "top_k": 0.0,
    "top_p": 1.0,
    "do_sample": True,
    "pad_token_id": tokenizer.eos_token_id,
    "max_new_tokens": 20,
}

device_type = 'cuda'
dtype = model.config.torch_dtype

if type(dtype) is str:
    if   dtype ==  "float16": dtype = torch.float16
    elif dtype == "bfloat16": dtype = torch.bfloat16
pass

if model.generate.__name__ != "_fast_generate":
    model._unwrapped_old_generate = model.generate
    model.generate = unsloth.models.llama._wrap_fast_inference(model.generate, device_type, dtype, model)
pass

response_tensor = ppo_trainer.generate(list(query_tensor), return_prompt=False, **generation_kwargs)
response_txt = tokenizer.decode(response_tensor[0])
response_txt

I met the same problem. I replicated the example from trl and I solve it by extracting part of the code from FastLanguageModel.for_inference(model). The reason I did not explicitly call FastLanguageModel.for_inference is that it will fail after calling

model = AutoModelForCausalLMWithValueHead.from_pretrained(model)

and this is the error. it fails at calling internal_model.model

AttributeError                            Traceback (most recent call last)
Cell In[20], line 23
     20     model.generate = unsloth.models.llama._wrap_fast_inference(model.generate, device_type, dtype, model)
     21 pass
---> 23 FastLanguageModel.for_inference(model)
     25 response_tensor = ppo_trainer.generate(list(query_tensor), return_prompt=False, **generation_kwargs)
     26 response_txt = tokenizer.decode(response_tensor[0])

File [~/miniconda3/lib/python3.10/site-packages/unsloth/models/llama.py:2214](http://region-9.autodl.pro:50571/jupyter/lab/tree/autodl-tmp/project/~/miniconda3/lib/python3.10/site-packages/unsloth/models/llama.py#line=2213), in FastLlamaModel.for_inference(model)
   2212 internal_model = model
   2213 while not hasattr(internal_model, "lm_head"):
-> 2214     internal_model = internal_model.model
   2215 pass
   2216 lm_head = internal_model.lm_head.weight

File [~/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:1709](http://region-9.autodl.pro:50571/jupyter/lab/tree/autodl-tmp/project/~/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py#line=1708), in Module.__getattr__(self, name)
   1707     if name in modules:
   1708         return modules[name]
-> 1709 raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")

AttributeError: 'AutoModelForCausalLMWithValueHead' object has no attribute 'model'

New problem happens when calling ppo_trainer.step. I will get this error exactly as the problem I indicated in the another issue:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 28, 4096]], which is output 0 of MulBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

and I currently did not find way to solve this in ppo_trainer.

Hey @danielhanchen. Any chances this error was fixed? I am encountering the same error.

unslothai / unsloth

[Urgent] Llama3 NOT Working in PPO Trainer #884