When I set load_in_8bit=true, some errors occurred....

Whether in generate or finetune, once I set load_in_8bit=true, it cannot be generated normally. The model will output a bunch of question marks, just like the picture below: I printed out its vector as shown in the picture It looks like it wasn't generated properly at all, but when I set it load_in_8bit=false, it can be generated and fine-tuned normally.

I have installed bitsandbytes and accelerate correctly, and no errors will be reported during testing. I've been stuck on this problem for a week, so I wanted to ask for help, thank you! ! Below is my generate.py code

from peft import PeftModel
from transformers import LlamaTokenizer, LlamaForCausalLM, GenerationConfig
tokenizer = LlamaTokenizer.from_pretrained("llama1")
model = LlamaForCausalLM.from_pretrained(
    "llama1",
    load_in_8bit = True,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, "tloen/alpaca-lora")
def alpaca_talk(text):
    inputs = tokenizer(
        text,
        return_tensors="pt",
    )
    input_ids = inputs["input_ids"].cuda()
    generation_config = GenerationConfig(
        temperature=0.9,
        top_p=0.75,
    )
    print("Generating...")
    generation_output = model.generate(
        input_ids=input_ids,
        generation_config=generation_config,
        return_dict_in_generate=True,
        output_scores=True,
        max_new_tokens=256,
    )
    for s in generation_output.sequences:
        print(tokenizer.decode(s))

for input_text in [
    """Below is an instruction that describes a task. Write a response that appropriately completes the request.
    ### Instruction:
    What steps should I ....?
    ### Response:
    """
]:
    alpaca_talk(input_text)`

tloen / alpaca-lora

When I set load_in_8bit=true, some errors occurred.... #606