Open tutu329 opened 11 months ago
code: from airllm import AirLLMLlama2 MAX_LENGTH = 128 # could use hugging face model repo id:
model = AirLLMLlama2("D:/models/Qwen-72B-Chat")
input_text = [ 'What is the capital of United States?',
] input_tokens = model.tokenizer( input_text, return_tensors="pt", return_attention_mask=False, truncation=True, max_length=MAX_LENGTH, padding=True ) generation_output = model.generate( input_tokens['input_ids'].cuda(), max_new_tokens=20, use_cache=True, return_dict_in_generate=True) output = model.tokenizer.decode(generation_output.sequences[0]) print(output)
error:
File "C:\Users\tutu\anaconda3\envs\airllm\lib\site-packages\airllm\utils.py", line 259, in split_and_save_layers if max(shards) > shard: ValueError: max() arg is an empty sequence
model:
https://huggingface.co/Qwen/Qwen-72B-Chat
can you try the following change:
from airllm import AirLLMLlama2 <----- from airllm import AirLLMQWen
code: from airllm import AirLLMLlama2 MAX_LENGTH = 128 # could use hugging face model repo id:
model = AirLLMLlama2("D:/models/Qwen-72B-Chat")
input_text = [
'What is the capital of United States?',
'I like',
] input_tokens = model.tokenizer( input_text,
return_tensors="pt",
return_attention_mask=False,
truncation=True,
max_length=MAX_LENGTH,
padding=True ) generation_output = model.generate(
input_tokens['input_ids'].cuda(),
max_new_tokens=20,
use_cache=True,
return_dict_in_generate=True) output = model.tokenizer.decode(generation_output.sequences[0]) print(output)
error:
File "C:\Users\tutu\anaconda3\envs\airllm\lib\site-packages\airllm\utils.py", line 259, in split_and_save_layers if max(shards) > shard: ValueError: max() arg is an empty sequence
model:
https://huggingface.co/Qwen/Qwen-72B-Chat