lyogavin / airllm

AirLLM 70B inference with single 4GB GPU
Apache License 2.0
5.36k stars 430 forks source link

ValueError: max() arg is an empty sequence #62

Open tutu329 opened 11 months ago

tutu329 commented 11 months ago

code: from airllm import AirLLMLlama2

 MAX_LENGTH = 128

# could use hugging face model repo id:


model = AirLLMLlama2("D:/models/Qwen-72B-Chat")





input_text = [

'What is the capital of United States?',


'I like',


] 

input_tokens = model.tokenizer( input_text,

return_tensors="pt",

return_attention_mask=False,

truncation=True,

max_length=MAX_LENGTH,

padding=True )

 generation_output = model.generate(

input_tokens['input_ids'].cuda(),

max_new_tokens=20,

use_cache=True,

return_dict_in_generate=True)

 output = model.tokenizer.decode(generation_output.sequences[0])

print(output)

error:

File "C:\Users\tutu\anaconda3\envs\airllm\lib\site-packages\airllm\utils.py", line 259, in split_and_save_layers if max(shards) > shard: ValueError: max() arg is an empty sequence

model:

https://huggingface.co/Qwen/Qwen-72B-Chat

lyogavin commented 11 months ago

can you try the following change:

from airllm import AirLLMLlama2

 <----- from airllm import AirLLMQWen