Whether in generate or finetune, once I set load_in_8bit=true, it cannot be generated normally. The model will output a bunch of question marks, just like the picture below:
I printed out its vector as shown in the picture
It looks like it wasn't generated properly at all, but when I set it load_in_8bit=false, it can be generated and fine-tuned normally.
I have installed bitsandbytes and accelerate correctly, and no errors will be reported during testing. I've been stuck on this problem for a week, so I wanted to ask for help, thank you! !
Below is my generate.py code
from peft import PeftModel
from transformers import LlamaTokenizer, LlamaForCausalLM, GenerationConfig
tokenizer = LlamaTokenizer.from_pretrained("llama1")
model = LlamaForCausalLM.from_pretrained(
"llama1",
load_in_8bit = True,
device_map="auto",
)
model = PeftModel.from_pretrained(model, "tloen/alpaca-lora")
def alpaca_talk(text):
inputs = tokenizer(
text,
return_tensors="pt",
)
input_ids = inputs["input_ids"].cuda()
generation_config = GenerationConfig(
temperature=0.9,
top_p=0.75,
)
print("Generating...")
generation_output = model.generate(
input_ids=input_ids,
generation_config=generation_config,
return_dict_in_generate=True,
output_scores=True,
max_new_tokens=256,
)
for s in generation_output.sequences:
print(tokenizer.decode(s))
for input_text in [
"""Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
What steps should I ....?
### Response:
"""
]:
alpaca_talk(input_text)`
Whether in generate or finetune, once I set load_in_8bit=true, it cannot be generated normally. The model will output a bunch of question marks, just like the picture below: I printed out its vector as shown in the picture It looks like it wasn't generated properly at all, but when I set it load_in_8bit=false, it can be generated and fine-tuned normally.
I have installed bitsandbytes and accelerate correctly, and no errors will be reported during testing. I've been stuck on this problem for a week, so I wanted to ask for help, thank you! ! Below is my generate.py code