Open yuan-xia opened 4 months ago
Hmm I think someone did mention PPO does not work - I think the generation step isn't working properly
Hmm I think someone did mention PPO does not work - I think the generation step isn't working properly
Hey, Yes, the generation is stuck forever if I just load the model directly using the official's code. What I did to solve this issue is to load the merged GGUF fine tuned Llama3 8b bnb 4bit model into the AutoModelForCausalLMWithValueHead by using load_in_4bit = True, which can make the generation proceed. However, the generation output loses precision largely. Do you have any insight why the loading does not work still? FYI, I the merged model does not lose precision if I load into AutoModelForCausalLM.
ppo_config = {"mini_batch_size": 1, "batch_size": 1}
config = PPOConfig(**ppo_config)
ppo_trainer = PPOTrainer(config, model, ref_model, tokenizer)
query_txt = "This morning I went to the "
query_tensor = tokenizer.encode(query_txt, return_tensors="pt").to(model.pretrained_model.device)
query_tensor
generation_kwargs = {
"min_length": -1,
"top_k": 0.0,
"top_p": 1.0,
"do_sample": True,
"pad_token_id": tokenizer.eos_token_id,
"max_new_tokens": 20,
}
device_type = 'cuda'
dtype = model.config.torch_dtype
if type(dtype) is str:
if dtype == "float16": dtype = torch.float16
elif dtype == "bfloat16": dtype = torch.bfloat16
pass
if model.generate.__name__ != "_fast_generate":
model._unwrapped_old_generate = model.generate
model.generate = unsloth.models.llama._wrap_fast_inference(model.generate, device_type, dtype, model)
pass
response_tensor = ppo_trainer.generate(list(query_tensor), return_prompt=False, **generation_kwargs)
response_txt = tokenizer.decode(response_tensor[0])
response_txt
I met the same problem. I replicated the example from trl and I solve it by extracting part of the code from FastLanguageModel.for_inference(model). The reason I did not explicitly call FastLanguageModel.for_inference is that it will fail after calling
model = AutoModelForCausalLMWithValueHead.from_pretrained(model)
and this is the error. it fails at calling internal_model.model
AttributeError Traceback (most recent call last)
Cell In[20], line 23
20 model.generate = unsloth.models.llama._wrap_fast_inference(model.generate, device_type, dtype, model)
21 pass
---> 23 FastLanguageModel.for_inference(model)
25 response_tensor = ppo_trainer.generate(list(query_tensor), return_prompt=False, **generation_kwargs)
26 response_txt = tokenizer.decode(response_tensor[0])
File [~/miniconda3/lib/python3.10/site-packages/unsloth/models/llama.py:2214](http://region-9.autodl.pro:50571/jupyter/lab/tree/autodl-tmp/project/~/miniconda3/lib/python3.10/site-packages/unsloth/models/llama.py#line=2213), in FastLlamaModel.for_inference(model)
2212 internal_model = model
2213 while not hasattr(internal_model, "lm_head"):
-> 2214 internal_model = internal_model.model
2215 pass
2216 lm_head = internal_model.lm_head.weight
File [~/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:1709](http://region-9.autodl.pro:50571/jupyter/lab/tree/autodl-tmp/project/~/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py#line=1708), in Module.__getattr__(self, name)
1707 if name in modules:
1708 return modules[name]
-> 1709 raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'AutoModelForCausalLMWithValueHead' object has no attribute 'model'
New problem happens when calling ppo_trainer.step. I will get this error exactly as the problem I indicated in the another issue:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 28, 4096]], which is output 0 of MulBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
and I currently did not find way to solve this in ppo_trainer.
ppo_config = {"mini_batch_size": 1, "batch_size": 1} config = PPOConfig(**ppo_config) ppo_trainer = PPOTrainer(config, model, ref_model, tokenizer) query_txt = "This morning I went to the " query_tensor = tokenizer.encode(query_txt, return_tensors="pt").to(model.pretrained_model.device) query_tensor generation_kwargs = { "min_length": -1, "top_k": 0.0, "top_p": 1.0, "do_sample": True, "pad_token_id": tokenizer.eos_token_id, "max_new_tokens": 20, } device_type = 'cuda' dtype = model.config.torch_dtype if type(dtype) is str: if dtype == "float16": dtype = torch.float16 elif dtype == "bfloat16": dtype = torch.bfloat16 pass if model.generate.__name__ != "_fast_generate": model._unwrapped_old_generate = model.generate model.generate = unsloth.models.llama._wrap_fast_inference(model.generate, device_type, dtype, model) pass response_tensor = ppo_trainer.generate(list(query_tensor), return_prompt=False, **generation_kwargs) response_txt = tokenizer.decode(response_tensor[0]) response_txt
I met the same problem. I replicated the example from trl and I solve it by extracting part of the code from FastLanguageModel.for_inference(model). The reason I did not explicitly call FastLanguageModel.for_inference is that it will fail after calling
model = AutoModelForCausalLMWithValueHead.from_pretrained(model)
and this is the error. it fails at calling internal_model.model
AttributeError Traceback (most recent call last) Cell In[20], line 23 20 model.generate = unsloth.models.llama._wrap_fast_inference(model.generate, device_type, dtype, model) 21 pass ---> 23 FastLanguageModel.for_inference(model) 25 response_tensor = ppo_trainer.generate(list(query_tensor), return_prompt=False, **generation_kwargs) 26 response_txt = tokenizer.decode(response_tensor[0]) File [~/miniconda3/lib/python3.10/site-packages/unsloth/models/llama.py:2214](http://region-9.autodl.pro:50571/jupyter/lab/tree/autodl-tmp/project/~/miniconda3/lib/python3.10/site-packages/unsloth/models/llama.py#line=2213), in FastLlamaModel.for_inference(model) 2212 internal_model = model 2213 while not hasattr(internal_model, "lm_head"): -> 2214 internal_model = internal_model.model 2215 pass 2216 lm_head = internal_model.lm_head.weight File [~/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:1709](http://region-9.autodl.pro:50571/jupyter/lab/tree/autodl-tmp/project/~/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py#line=1708), in Module.__getattr__(self, name) 1707 if name in modules: 1708 return modules[name] -> 1709 raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'") AttributeError: 'AutoModelForCausalLMWithValueHead' object has no attribute 'model'
New problem happens when calling ppo_trainer.step. I will get this error exactly as the problem I indicated in the another issue:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 28, 4096]], which is output 0 of MulBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
and I currently did not find way to solve this in ppo_trainer.
Hi, have you solved this issue now? I think you can use the merged model to load in valuehead to avoid any loading error, but it's still not training well on side. It's really annoying for researchers who wants to use RLHF... Hope it can be fixed soon.
No sorry sadly I did not have time to look at PPO / RLOO or ones with generation sorry :( I'll try if I have time, but sadly this'll have to wait sorry
ppo_config = {"mini_batch_size": 1, "batch_size": 1} config = PPOConfig(**ppo_config) ppo_trainer = PPOTrainer(config, model, ref_model, tokenizer) query_txt = "This morning I went to the " query_tensor = tokenizer.encode(query_txt, return_tensors="pt").to(model.pretrained_model.device) query_tensor generation_kwargs = { "min_length": -1, "top_k": 0.0, "top_p": 1.0, "do_sample": True, "pad_token_id": tokenizer.eos_token_id, "max_new_tokens": 20, } device_type = 'cuda' dtype = model.config.torch_dtype if type(dtype) is str: if dtype == "float16": dtype = torch.float16 elif dtype == "bfloat16": dtype = torch.bfloat16 pass if model.generate.__name__ != "_fast_generate": model._unwrapped_old_generate = model.generate model.generate = unsloth.models.llama._wrap_fast_inference(model.generate, device_type, dtype, model) pass response_tensor = ppo_trainer.generate(list(query_tensor), return_prompt=False, **generation_kwargs) response_txt = tokenizer.decode(response_tensor[0]) response_txt
I met the same problem. I replicated the example from trl and I solve it by extracting part of the code from FastLanguageModel.for_inference(model). The reason I did not explicitly call FastLanguageModel.for_inference is that it will fail after calling
model = AutoModelForCausalLMWithValueHead.from_pretrained(model)
and this is the error. it fails at calling internal_model.model
AttributeError Traceback (most recent call last) Cell In[20], line 23 20 model.generate = unsloth.models.llama._wrap_fast_inference(model.generate, device_type, dtype, model) 21 pass ---> 23 FastLanguageModel.for_inference(model) 25 response_tensor = ppo_trainer.generate(list(query_tensor), return_prompt=False, **generation_kwargs) 26 response_txt = tokenizer.decode(response_tensor[0]) File [~/miniconda3/lib/python3.10/site-packages/unsloth/models/llama.py:2214](http://region-9.autodl.pro:50571/jupyter/lab/tree/autodl-tmp/project/~/miniconda3/lib/python3.10/site-packages/unsloth/models/llama.py#line=2213), in FastLlamaModel.for_inference(model) 2212 internal_model = model 2213 while not hasattr(internal_model, "lm_head"): -> 2214 internal_model = internal_model.model 2215 pass 2216 lm_head = internal_model.lm_head.weight File [~/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:1709](http://region-9.autodl.pro:50571/jupyter/lab/tree/autodl-tmp/project/~/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py#line=1708), in Module.__getattr__(self, name) 1707 if name in modules: 1708 return modules[name] -> 1709 raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'") AttributeError: 'AutoModelForCausalLMWithValueHead' object has no attribute 'model'
New problem happens when calling ppo_trainer.step. I will get this error exactly as the problem I indicated in the another issue:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 28, 4096]], which is output 0 of MulBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
and I currently did not find way to solve this in ppo_trainer.
Hey @danielhanchen. Any chances this error was fixed? I am encountering the same error.
Hi, I'm using fine tuned Llama3 model trained using unsloth. I noticed the model needs for_inference() to make sure there is no error for model.generate(). However, in PPO trainer, model is passed as AutoModelForCausalLMWithValueHead.from_pretrained(unsloth_model) and PPO trainer call model.generate() directly. There is the error again, therefore. Is there any way to get rid of the issue or avoid using for_inference()? Much appreciated.
model = AutoModelForCausalLMWithValueHead.from_pretrained(unsloth_model)
response_tensors = ppo_trainer.generate(query_tensors, **generation_kwargs)