Open Uxito-Ada opened 10 months ago
I can't reproduce this issue.
Based on my test, below code will call self.greedy_search
.
import torch
import intel_extension_for_pytorch as ipex
from bigdl.llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
# load
model = AutoModelForCausalLM.from_pretrained(model_path,
optimize_model=True,
torch_dtype=torch.bfloat16,
load_in_low_bit="bf16",
trust_remote_code=True,
use_cache=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
input_str = "tell me a story"
input_ids = tokenizer.encode(input_str, return_tensors="pt")
# inference
original_output = model.generate(input_ids=input_ids,
use_cache=False,
max_new_tokens=13,
do_sample=False)
output_str = tokenizer.decode(original_output[0], skip_special_tokens=True)
print(original_output)
print(output_str)
Is import intel_extension_for_pytorch as ipex
necessary? As import will do some init works. @rnwang04
Is
import intel_extension_for_pytorch as ipex
necessary? As import will do some init works. @rnwang04
It's not necessary, here use ipex as I validate in a GPU conda env. I have double checked in a CPU conda env, and confirm that it does use greedy search. And I also found that our bf16 has same output with native bf16 in CPU env.
our bf16
=================enter greedy search================
tensor([[83680, 1643, 1346, 3028, 1670, 1346, 1750, 1777, 1438, 1738,
33105, 72, 5, 13602, 5920, 1346, 1750]])
tell me a story about a time when you were scared.
Once upon a time
native bf16
=================enter greedy search================
tensor([[83680, 1643, 1346, 3028, 1670, 1346, 1750, 1777, 1438, 1738,
33105, 72, 5, 13602, 5920, 1346, 1750]])
tell me a story about a time when you were scared.
Once upon a time
This is a bigdl-bf16 model, where
model_path
is to a Baichuan2-13B-Chat:It is found greedy_search is not called as expected when using
model.generate
API.By contrast, bigdl-int4 calls
greedy_search
as expected while applying the same style API as below:And this will finally influence the outputs, where bigdl-int4 and ipex-bf16 both use
greedy _search
and thus give closer answers, while bigdl-bf16 shows an difference:Hope bigdl-bf16's service owner can help to fix it pls.