Open binhmed2lab opened 1 year ago
It took me a few day to figure out what's wrong when evaluating the trained lora.
`def generate_response(prompt, model, temperature = 0.1, num_beams = 1, top_k = 50, repetition_penalty=1): encoding = tokenizer(prompt, padding=True, truncation=True, return_tensors="pt", max_length = 1024) input_ids = encoding["input_ids"].to(device) attention_mask = encoding['attention_mask'].to(device) # I just added
generation_config = GenerationConfig( temperature=temperature, top_p=1, do_sample = True, num_beams = num_beams, top_k = top_k, repetition_penalty = repetition_penalty ) with torch.inference_mode(): return model.generate( input_ids=input_ids, attention_mask=attention_mask, generation_config=generation_config, return_dict_in_generate=True, output_scores=True, max_new_tokens=512, )`
The reason is the padding token still contributes its weight to generation, if attention_mask is not provided.
It took me a few day to figure out what's wrong when evaluating the trained lora.
`def generate_response(prompt, model, temperature = 0.1, num_beams = 1, top_k = 50, repetition_penalty=1): encoding = tokenizer(prompt, padding=True, truncation=True, return_tensors="pt", max_length = 1024) input_ids = encoding["input_ids"].to(device) attention_mask = encoding['attention_mask'].to(device) # I just added
The reason is the padding token still contributes its weight to generation, if attention_mask is not provided.