microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.27k stars 228 forks source link

run local error #72

Open songsh opened 6 months ago

songsh commented 6 months ago

i run in local, error is: Loading checkpoint shards:

屏幕快照 2024-01-21 下午4 48 08
### Tasks
- [ ] support chinese?
iofu728 commented 6 months ago

Hi @songsh,

It appears there was an interruption while loading the model. Could you please try to load the model normally using the following code?

from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer

model_name = "NousResearch/Llama-2-7b-hf"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="cuda",
    ignore_mismatched_sizes=True,
)
songsh commented 6 months ago

can model support chinese? i use chinese, answer has Garbled code

iofu728 commented 6 months ago

Hi @songsh, currently, this issue does occur #4. We plan to fix it in the future. For now, we recommend using a Chinese small language model, such as Skyword.