Open minuenergy opened 2 months ago
Perhaps you can try nn.Module
's hook
import torch
class Hook:
def __init__(self, module):
self.hook = module.register_forward_hook(self.hook_fn)
self.output = None
def hook_fn(self, module, input, output):
self.output = output
def close(self):
self.hook.remove()
def load_captioning_model(model_id):
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
model = LlavaForConditionalGeneration.from_pretrained(model_id, low_cpu_mem_usage=True, quantization_config = quantization_config)
processor = AutoProcessor.from_pretrained(model_id, pad_token="<pad>")
return processor, model
model_id = "llava-hf/llava-1.5-7b-hf"
processor, model = load_captioning_model(model_id)
embed_last_hook = Hook(model.language_model.model.norm) # for save embed
embed_last_hook.output # (1, 4096)
When I use this code ( with HuggingFace ), i can get 1,4096's embed
But I got Another result below ( with vllm ), torch.Size([596, 4096])
from vllm import LLM, SamplingParams
llm = LLM(model="llava-hf/llava-1.5-7b-hf")
D_HOOK = Hook(llm.llm_engine.model_executor.driver_worker.model_runner.model.language_model.norm)
D_HOOK.output # (596, 4096)
I want to get same features at huggingface. What should i do? and what is different between this ?
HuggingFace's model
Vllm's model
Your current environment
Initialize the LLaVA-1.5 model
llm = LLM(model="llava-hf/llava-1.5-7b-hf")
print(llm)
embed_last_hook = Hook(model.language_model.model.norm) # for save embed
Define the prompts and images
base_p = '../../../data/detect/coco/train2017' img_p1 = os.path.join(base_p, '000000265292.jpg') img_p2 = os.path.join(base_p, '000000318124.jpg') img_p3 = os.path.join(base_p, '000000370121.jpg')
prompts = [ {"prompt": "USER:\nWhat is the content of this image?\nASSISTANT:", "multi_modal_data": {"image": PIL.Image.open(img_p1)}},
{"prompt": "USER: \nWhat is the content of this image?\nASSISTANT:", "multi_modal_data": {"image": PIL.Image.open(img_p2)}},
{"prompt": "USER: \nWhat is the content of this image?\nASSISTANT:", "multi_modal_data": {"image": PIL.Image.open(img_p3)}}
]
Define sampling parameters
sampling_params = SamplingParams(temperature=0.0, top_p=1.0)
Generate outputs
outputs = llm.generate(prompts, sampling_params=sampling_params)
I want to add hook some features when llm's forward finished how can i get feature inside?
How would you like to use vllm
I want to run inference of a [specific model](put link here). I don't know how to integrate it with vllm.