how to create LLM() object given a model and a tokenizer?

Hello! I'm wondering if it's possible to load a model and a tokenizer, and then pass the two of them to vllm.LLM() to create an object. The reason I am trying to create the object this way (instead of using the model folder) is because my model is quantized by bitsandbytes and it seems vllm does not currently support bitsandbytes (i.e., when l run vllm.LLM(model="path_to_model_repo", tokenizer="path_to_model_repo"), I get the error message: ValueError: Unknown quantization method: bitsandbytes. Must be one of ['awq', 'gptq', 'squeezellm'].)

Thus, I'm trying to load the model and tokenizer myself to create the vllm.LLM() object. I tried the following but it gives an error.

import vllm
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

#load model and tokenizer
model_repo = "path_to_model_repo"
model = AutoModelForCausalLM.from_pretrained(
    model_name, 
    use_safetensors=True, 
    torch_dtype=torch.float16, 
    device_map="auto",
    )
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")

#create vllm.LLM() object
client = vllm.LLM(model=model, tokenizer=tokenizer)

Error message: Please provide either the path to a local folder or the repo_id of a model on the Hub.

I'd really appreciate any insight on how to use a model and tokenizer to create a vllm.LLM() object / how to bypass bitsandbytes not currently being supported. Thank you very much!

vllm-project / vllm

how to create LLM() object given a model and a tokenizer? #2836