cannot load model back due to [does not appear to have a file named config.json]

yananchen1989 commented 4 months ago

System Info

vllm version: 0.4.1

Who can help?

No response

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

I fine-tuned the mistral-7b-v0.2 model using the trainer of huggingface https://huggingface.co/docs/trl/v0.8.6/trainer the training worked well and finally it saved the model, as bellow: adapter_config.json adapter_model.safetensors checkpoint-16 checkpoint-24 checkpoint-8 README.md special_tokens_map.json tokenizer_config.json tokenizer.json tokenizer.model training_args.bin

However, when I try to load it back via vllm, it caused error:

does not appear to have a file named config.json

from langchain_community.llms import VLLM
llm = VLLM(model="path_to_the_model", 
       trust_remote_code=True,  # mandatory for hf models
       max_new_tokens=64,
       temperature=0,
       # tensor_parallel_size=... # for distributed inference
)
llm.invoke("what is the captital city of ontario ?")

however, when I load it via AutoModelForCausalLM.from_pretrained, everthing is fine.

any advice ?

Expected behavior

It should load it back .

yananchen1989 commented 4 months ago

sft via lora.

yananchen1989 commented 4 months ago

should I download https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/blob/main/config.json into the model saved path ?

yananchen1989 commented 4 months ago

i also test the code in https://docs.vllm.ai/en/latest/models/lora.html

from vllm import LLM, SamplingParams
from vllm.lora.request import LoRARequest

llm = LLM(model=LOCAL_PATH, enable_lora=True)

where LOCAL_PATH stores:

adapter_config.json
adapter_model.safetensors
README.md
special_tokens_map.json
tokenizer_config.json
tokenizer.json tokenizer.model trainer_state.json
training_args.bin

also, model = AutoModelForCausalLM.from_pretrained(LOCAL_PATH) works well.

vasqu commented 4 months ago

So did this snippet work? -->

from vllm import LLM, SamplingParams
from vllm.lora.request import LoRARequest

llm = LLM(model=LOCAL_PATH, enable_lora=True)

Maybe pass the enable_lora=True kwarg to the langchain alternative. Another alternative is to merge the lora weights and get a "base" model back, copy the base config, and reload in langchain. But tbh, this is more of a langchain/vllm issue rather than the transformers library itself.

yananchen1989 commented 4 months ago

So did this snippet work? -->
from vllm import LLM, SamplingParams
from vllm.lora.request import LoRARequest

llm = LLM(model=LOCAL_PATH, enable_lora=True)
Maybe pass the enable_lora=True kwarg to the langchain alternative. Another alternative is to merge the lora weights and get a "base" model back, copy the base config, and reload in langchain. But tbh, this is more of a langchain/vllm issue rather than the transformers library itself.

this does not work neither. yes, seems that it is a vllm issue.

huggingface / transformers