TypeError with chatglm2-6b-32k model

Kailuo-Lai commented 1 year ago

Code:

from bigdl.llm.transformers import AutoModel
from transformers import AutoTokenizer
model = AutoModel.from_pretrained("./checkpoints/chatglm2-6b-32k/",
                               load_in_low_bit="sym_int4",
                               trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("./checkpoints/chatglm2-6b-32k/",
                                          trust_remote_code=True)
prompt = "What is AI?"
CHATGLM2_PROMPT_TEMPLATE = "USER: {prompt}\nASSISTANT:"
model.chat(tokenizer, CHATGLM2_PROMPT_TEMPLATE.format(prompt=prompt), history=[])

Output:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[6], line 3
      1 prompt = "What is AI?"
      2 CHATGLM2_PROMPT_TEMPLATE = "USER: {prompt}\nASSISTANT:"
----> 3 model.chat(tokenizer, CHATGLM2_PROMPT_TEMPLATE.format(prompt=prompt), history=[])

File ~/anaconda3/envs/llm-tutorial/lib/python3.9/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File ~/.cache/huggingface/modules/transformers_modules/modeling_chatglm.py:1042, in ChatGLMForConditionalGeneration.chat(self, tokenizer, query, history, max_length, num_beams, do_sample, top_p, temperature, logits_processor, **kwargs)
   1039 gen_kwargs = {"max_length": max_length, "num_beams": num_beams, "do_sample": do_sample, "top_p": top_p,
   1040               "temperature": temperature, "logits_processor": logits_processor, **kwargs}
   1041 inputs = self.build_inputs(tokenizer, query, history=history)
-> 1042 outputs = self.generate(**inputs, **gen_kwargs)
   1043 outputs = outputs.tolist()[0][len(inputs["input_ids"][0]):]
   1044 response = tokenizer.decode(outputs)

File ~/anaconda3/envs/llm-tutorial/lib/python3.9/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
...
--> 655                 presents = torch.cat((presents, kv_cache), dim=0)
    657 if output_hidden_states:
    658     all_hidden_states = all_hidden_states + (hidden_states,)

TypeError: expected Tensor as element 0 in argument 0, but got tuple

Env:

torch 2.0.1 bigdl-llm 2.4.0b20231007 transformers 4.31.0 langchain 0.0.248

jason-dai commented 1 year ago

Can you try:

model = AutoModel.from_pretrained("./checkpoints/chatglm2-6b-32k/",
                               load_in_low_bit="sym_int4",
                               trust_remote_code=True,
                               optimize_model=False)

Kailuo-Lai commented 1 year ago

@jason-dai Thanks, it works. And will this solution affect the efficiency of llm?

jason-dai commented 1 year ago

@jason-dai Thanks, it works. And will this solution affect the efficiency of llm?

Yes - when optimize_model is True, we will apply more aggressive model optimizations, but it is less stable and you can set it to False if running into any issues; we'll take a look at how to enable it for chatglm2-6b-32k.

Kailuo-Lai commented 1 year ago

@jason-dai Thanks, it works. And will this solution affect the efficiency of llm?

Yes - when optimize_model is True, we will apply more aggressive model optimizations, but it is less stable and you can set it to False if running into any issues; we'll take a look at how to enable it for chatglm2-6b-32k.

Ok, I see. Thank you!

plusbang commented 1 year ago

Hi, @Kailuo-Lai We have enabled further model optimizations for chatglm2-6b-32k model now. Please wait 2.4.0b20231016 (which will be released tomorrow) or later version of bigdl-llm to run the following code:

model = AutoModel.from_pretrained("THUDM/chatglm2-6b-32k",
                                  load_in_low_bit="sym_int4",
                                  trust_remote_code=True)

Kailuo-Lai commented 1 year ago

Hi, @Kailuo-Lai We have enabled further model optimizations for chatglm2-6b-32k model now. Please wait 2.4.0b20231016 (which will be released tomorrow) or later version of bigdl-llm to run the following code:
model = AutoModel.from_pretrained("THUDM/chatglm2-6b-32k",
                                  load_in_low_bit="sym_int4",
                                  trust_remote_code=True)

Thank you, I will try in the future.

intel-analytics / ipex-llm