Open UbeCc opened 4 months ago
in generate_reward function, it's true that self_reward_model is referenced before use. but change self_reward_model = self_reward_model.to(device)
to self_reward_model = self.self_reward_model.to(device)
will cause other errors
def generate_reward(
self,
prompt: str,
response: str
) -> Optional[float]:
"""
main contribution of the paper is the logic in this function
in paper, they sample it 3 times and then average
"""
device = next(self.model.parameters()).device
template_fn = self.reward_config.template_fn
parse_reward = self.reward_config.parse_reward
reward_prompt_str = template_fn(prompt = prompt, response = response)
reward_prompt = self.tokenizer_encode(reward_prompt_str).to(device)
reward_prompt = repeat(reward_prompt, 'n -> b n', b = self.num_evals_to_average)
reward_prompt = reward_prompt.to(device)
**self_reward_model = self_reward_model.to(device)**
reward_responses = sample(
self_reward_model,
prompts = reward_prompt,
seq_len = self.generate_reward_max_seq_len,
temperature = self.eval_temperature,
filter_fn = self.eval_filter_fn,
filter_kwargs = self.eval_filter_kwargs
)
have you solved it?
have you solved it?
Sorry, I haven't. When I change the code to self_reward_model = self.self_reward_model.to(device)
, the program will be in an endless loop...
I just copied the demo to test the crate, but got
UnboundLocalError: local variable 'self_reward_model' referenced before assignment
code:
Traceback stack
How could this happen? Could somebody help me? thx!