hyperonym / basaran

Basaran is an open-source alternative to the OpenAI text completion API. It provides a compatible streaming API for your Hugging Face Transformers-based text generation models.
MIT License
1.29k stars 81 forks source link

TypeError: __init__() got an unexpected keyword argument 'load_in_4bit' #227

Open tanshuai opened 12 months ago

tanshuai commented 12 months ago
# MODEL_TRUST_REMOTE_CODE=True MODEL=huggyllama/llama-7b PORT=80 python -m basaran
Traceback (most recent call last):
  File "/root/anaconda3/envs/cuda_test2/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/root/anaconda3/envs/cuda_test2/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/root/anaconda3/envs/cuda_test2/lib/python3.9/site-packages/basaran/__main__.py", line 41, in <module>
    stream_model = load_model(
  File "/root/anaconda3/envs/cuda_test2/lib/python3.9/site-packages/basaran/model.py", line 334, in load_model
    model = AutoModelForCausalLM.from_pretrained(name_or_path, **kwargs)
  File "/root/anaconda3/envs/cuda_test2/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 467, in from_pretrained
    return model_class.from_pretrained(
  File "/root/anaconda3/envs/cuda_test2/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2611, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
TypeError: __init__() got an unexpected keyword argument 'load_in_4bit'
# MODEL_TRUST_REMOTE_CODE=True MODEL=openlm-research/open_llama_3b PORT=80 python -m basaran

Traceback (most recent call last):
  File "/root/anaconda3/envs/cuda_test2/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/root/anaconda3/envs/cuda_test2/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/root/anaconda3/envs/cuda_test2/lib/python3.9/site-packages/basaran/__main__.py", line 41, in <module>
    stream_model = load_model(
  File "/root/anaconda3/envs/cuda_test2/lib/python3.9/site-packages/basaran/model.py", line 334, in load_model
    model = AutoModelForCausalLM.from_pretrained(name_or_path, **kwargs)
  File "/root/anaconda3/envs/cuda_test2/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 467, in from_pretrained
    return model_class.from_pretrained(
  File "/root/anaconda3/envs/cuda_test2/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2611, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
TypeError: __init__() got an unexpected keyword argument 'load_in_4bit'

But this model works perfect with transformers:

import torch
from transformers import LlamaTokenizer, LlamaForCausalLM

## v2 models
#model_path = 'openlm-research/open_llama_7b_v2'

## v1 models
model_path = 'openlm-research/open_llama_3b'
# model_path = 'openlm-research/open_llama_7b'
# model_path = 'openlm-research/open_llama_13b'

tokenizer = LlamaTokenizer.from_pretrained(model_path)
model = LlamaForCausalLM.from_pretrained(
    model_path, torch_dtype=torch.float16, device_map='auto',
)

prompt = 'Q: What is China?\nA:'
input_ids = tokenizer(prompt, return_tensors="pt").input_ids

generation_output = model.generate(
    input_ids=input_ids, max_new_tokens=32
)
print(tokenizer.decode(generation_output[0]))
peakji commented 12 months ago

Hi @tanshuai. Which version of transformers are you using?

Upgrading transformers to v4.30.2+ would solve the issue.