unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
16.32k stars 1.13k forks source link

How to load a unsloth checkpoint such that it works with vllm? #1063

Open brando90 opened 6 days ago

brando90 commented 6 days ago

Current attempt:

def test_unsloth_vllm(        
        max_length: int = 8192,
        use_4bit: bool = False,
    ):
    print('----> test_unsloth_vllm')
    import os
    from transformers import AutoModelForCausalLM, AutoTokenizer
    model_name = os.path.expanduser('~/data/runs/09192024_12h35m27s_run/train/checkpoint-820')
    # model_name: str = "Qwen/Qwen2-1.5B-Instruct"
    # model_name: str = "Qwen/Qwen2-1.5B"
    print(f'{model_name=}')
    # model = AutoModelForCausalLM.from_pretrained(
    #     model_name,
    #     torch_dtype="auto",
    #     device_map="auto"
    # )
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    prompt = "Continue the fibonnaci sequence for a 1 step only please: 1, 1, 2, 3, 5, 8,"
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": prompt}
    ]
    print('messages: ', messages)
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    print('text: ', text)
    # vllm gen
    from vllm import LLM, SamplingParams
    prompts = ["Hello, my name is"]
    sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
    llm = LLM(model=model_name)
    outputs = llm.generate(prompts, sampling_params)
    # Print the outputs.
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

 if __name__ == "__main__":
    import fire
    import time
    print('\n-- Start')
    start_time = time.time()
    fire.Fire(test_unsloth_vllm)
    # fire.Fire(test_unsloth_inference)
    # fire.Fire(test_unsloth_plus_hf_inference)
    # fire.Fire(test_unsloth_inference)
    print(f"Time taken: {time.time() - start_time:.2f} seconds, or {(time.time() - start_time) / 60:.2f} minutes, or {(time.time() - start_time) / 3600:.2f} hours.\a")

bug

(AI4Lean) root@miranebr-math-p4de-math-test-eval:~# python ~/AI4Lean/py_src/evals/chat_template_qwen2.py

-- Start
----> test_unsloth_vllm
model_name='/data/miranebr-sandbox/data/runs/09192024_12h35m27s_run/train/checkpoint-820'
messages:  [{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'Continue the fibonnaci sequence for a 1 step only please: 1, 1, 2, 3, 5, 8,'}]
text:  <|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Continue the fibonnaci sequence for a 1 step only please: 1, 1, 2, 3, 5, 8,<|im_end|>
<|im_start|>assistant

Traceback (most recent call last):
  File "/data/miranebr-sandbox/AI4Lean/py_src/evals/chat_template_qwen2.py", line 176, in <module>
    fire.Fire(test_unsloth_vllm)
  File "/data/miranebr-sandbox/.virtualenvs/AI4Lean/lib/python3.11/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/miranebr-sandbox/.virtualenvs/AI4Lean/lib/python3.11/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/data/miranebr-sandbox/.virtualenvs/AI4Lean/lib/python3.11/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/data/miranebr-sandbox/AI4Lean/py_src/evals/chat_template_qwen2.py", line 36, in test_unsloth_vllm
    llm = LLM(model=model_name)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/data/miranebr-sandbox/.virtualenvs/AI4Lean/lib/python3.11/site-packages/vllm/entrypoints/llm.py", line 118, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/miranebr-sandbox/.virtualenvs/AI4Lean/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 257, in from_engine_args
    engine_config = engine_args.create_engine_config()
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/miranebr-sandbox/.virtualenvs/AI4Lean/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 464, in create_engine_config
    model_config = ModelConfig(
                   ^^^^^^^^^^^^
  File "/data/miranebr-sandbox/.virtualenvs/AI4Lean/lib/python3.11/site-packages/vllm/config.py", line 107, in __init__
    self.hf_config = get_config(self.model, trust_remote_code, revision,
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/miranebr-sandbox/.virtualenvs/AI4Lean/lib/python3.11/site-packages/vllm/transformers_utils/config.py", line 23, in get_config
    config = AutoConfig.from_pretrained(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/miranebr-sandbox/.virtualenvs/AI4Lean/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 972, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/miranebr-sandbox/.virtualenvs/AI4Lean/lib/python3.11/site-packages/transformers/configuration_utils.py", line 632, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/miranebr-sandbox/.virtualenvs/AI4Lean/lib/python3.11/site-packages/transformers/configuration_utils.py", line 689, in _get_config_dict
    resolved_config_file = cached_file(
                           ^^^^^^^^^^^^
  File "/data/miranebr-sandbox/.virtualenvs/AI4Lean/lib/python3.11/site-packages/transformers/utils/hub.py", line 373, in cached_file
    raise EnvironmentError(
OSError: /data/miranebr-sandbox/data/runs/09192024_12h35m27s_run/train/checkpoint-820 does not appear to have a file named config.json. Checkout 'https://huggingface.co//data/miranebr-sandbox/data/runs/09192024_12h35m27s_run/train/checkpoint-820/tree/None' for available files.

how to fix? maybe if I save the model in a HF compatible way?

current save code:

        push_to_hub: Optional[bool] = False,
        model_save_name: Optional[str] = None,
        ....
    # Save the trained model
    if push_to_hub and model_save_name and hf_token:
        model.push_to_hub_merged(
            model_save_name,
            tokenizer=tokenizer,
            save_method="merged_16bit",
            token=hf_token,
        )
    elif model_save_name:
        model.save_pretrained_merged(model_save_name, tokenizer, save_method="merged_16bit")
brando90 commented 6 days ago

@danielhanchen ?

brando90 commented 5 days ago

related: https://github.com/unslothai/unsloth/issues/421

danielhanchen commented 5 days ago

Sorry will have a look at this! Apologies on the delay!

brando90 commented 5 days ago

Sorry will have a look at this! Apologies on the delay!

@danielhanchen I tried the config file name change you suggested here:

https://github.com/unslothai/unsloth/issues/421

but it doesn't work. What do you suggest I do?

brando90 commented 5 days ago

I thought the config file was the issue so I followed up with most of the details here: https://github.com/unslothai/unsloth/issues/421 @danielhanchen let me know if I may be of further help.

brando90 commented 5 days ago

ref: https://discord.com/channels/1179035537009545276/1289302027461070969

danielhanchen commented 1 day ago

Wait @brando90 does model.save_pretrained_merged(model_save_name, tokenizer, save_method="merged_16bit") not work for vLLM?

brando90 commented 1 day ago

Not for me. It doesn't work.

On Tue, Oct 1, 2024, 1:01 AM Daniel Han @.***> wrote:

Wait @brando90 https://github.com/brando90 does model.save_pretrained_merged(model_save_name, tokenizer, save_method="merged_16bit") not work for vLLM?

— Reply to this email directly, view it on GitHub https://github.com/unslothai/unsloth/issues/1063#issuecomment-2385062542, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOE6LR3BCC65JJNS2L5VQ3ZZJJFJAVCNFSM6AAAAABO5YBCQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOBVGA3DENJUGI . You are receiving this because you were mentioned.Message ID: @.***>

brando90 commented 1 day ago

If I recall correctly the main issue is the config.json file hf needs is not the same the unsloth one uses

On Tue, Oct 1, 2024, 10:25 AM Brando Miranda @.***> wrote:

Not for me. It doesn't work.

On Tue, Oct 1, 2024, 1:01 AM Daniel Han @.***> wrote:

Wait @brando90 https://github.com/brando90 does model.save_pretrained_merged(model_save_name, tokenizer, save_method="merged_16bit") not work for vLLM?

— Reply to this email directly, view it on GitHub https://github.com/unslothai/unsloth/issues/1063#issuecomment-2385062542, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOE6LR3BCC65JJNS2L5VQ3ZZJJFJAVCNFSM6AAAAABO5YBCQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOBVGA3DENJUGI . You are receiving this because you were mentioned.Message ID: @.***>

danielhanchen commented 20 hours ago

@brando90 Wait so you're saying all vLLM merges cannot be loaded up in vLLM?

brando90 commented 6 hours ago

correct, I think I shared a copy paste of the code AND error. Will look for it.

Brando Miranda Ph.D. Student Computer Science, Stanford University EDGE Scholar, Stanford University @.*** website: https://brando90.github.io/brandomiranda/home.html

On Oct 1, 2024, at 6:33 PM, Daniel Han @.***> wrote:

@brando90https://github.com/brando90 Wait so you're saying all vLLM merges cannot be loaded up in vLLM?

— Reply to this email directly, view it on GitHubhttps://github.com/unslothai/unsloth/issues/1063#issuecomment-2387463945, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAOE6LUD2NQVSQITNY73DL3ZZNEM3AVCNFSM6AAAAABO5YBCQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOBXGQ3DGOJUGU. You are receiving this because you were mentioned.Message ID: @.***>

brando90 commented 4 hours ago

Wait @brando90 does model.save_pretrained_merged(model_save_name, tokenizer, save_method="merged_16bit") not work for vLLM?

obvious sanity check. Have you tested it? e.g., with Qwen/Qwen2-1.5B?

brando90 commented 1 hour ago

if you are open to just using the lora one without merging do this:

https://github.com/unslothai/unsloth/issues/1039

note: if you are allowed to push to hf repo that should work too.