zjysteven / lmms-finetune

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, qwen-vl, qwen2-vl, phi3-v etc.
Apache License 2.0
162 stars 21 forks source link

Inference giving errors after finetuning LLavaInterleave 0.5B #34

Closed binarybeastt closed 1 month ago

binarybeastt commented 2 months ago

I'm getting errors after trying to perform inference on an interleave model I fine-tuned using LoRA quantization Here's the code:

import requests
from PIL import Image

import torch
from transformers import AutoProcessor, LlavaForConditionalGeneration
old_model = "llava-hf/llava-interleave-qwen-0.5b-hf"

model_id = "lmms-finetune/checkpoints/llava-interleave-qwen-0.5b_lora-True_qlora-False"
model = LlavaForConditionalGeneration.from_pretrained(
    model_id, 
    torch_dtype=torch.float16, 
    low_cpu_mem_usage=True, 
).to('cuda')

processor = AutoProcessor.from_pretrained(old_model)

# Define a chat history and use `apply_chat_template` to get correctly formatted prompt
# Each value in "content" has to be a list of dicts with types ("text", "image") 
conversation = [
    {

      "role": "user",
      "content": [
          {"type": "text", "text": "What's happening in the image?"},
          {"type": "image"},
        ],
    },
]
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)

image_file = "dataset/13-1-_jpg.rf.b79c411ea27ceba9c706391b027137f0.jpg"
raw_image = Image.open(image_file)
inputs = processor(images=raw_image, text=prompt, padding=True, return_tensors='pt').to('cuda')

output = model.generate(**inputs, max_new_tokens=200, do_sample=True)
print(processor.decode(output[0], skip_special_tokens=True))

Here's the error with traceback

Setting `pad_token_id` to `eos_token_id`:151645 for open-end generation.
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[6], line 36
     33 # inputs_video = processor(text=prompt, videos=clip, padding=True, return_tensors="pt").to(model.device)
     34 inputs = processor(images=raw_image, text=prompt, padding=True, return_tensors='pt').to('cuda')
---> 36 output = model.generate(**inputs, max_new_tokens=200, do_sample=True)
     37 print(processor.decode(output[0], skip_special_tokens=True))

File /home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/utils/_contextlib.py:116, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    113 @functools.wraps(func)
    114 def decorate_context(*args, **kwargs):
    115     with ctx_factory():
--> 116         return func(*args, **kwargs)

File /home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/transformers/generation/utils.py:2015, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, negative_prompt_ids, negative_prompt_attention_mask, **kwargs)
   2007     input_ids, model_kwargs = self._expand_inputs_for_generation(
   2008         input_ids=input_ids,
   2009         expand_size=generation_config.num_return_sequences,
   2010         is_encoder_decoder=self.config.is_encoder_decoder,
   2011         **model_kwargs,
   2012     )
   2014     # 12. run sample (it degenerates to greedy search when `generation_config.do_sample=False`)
-> 2015     result = self._sample(
   2016         input_ids,
   2017         logits_processor=prepared_logits_processor,
   2018         stopping_criteria=prepared_stopping_criteria,
   2019         generation_config=generation_config,
   2020         synced_gpus=synced_gpus,
   2021         streamer=streamer,
   2022         **model_kwargs,
   2023     )
   2025 elif generation_mode in (GenerationMode.BEAM_SAMPLE, GenerationMode.BEAM_SEARCH):
   2026     # 11. prepare beam search scorer
   2027     beam_scorer = BeamSearchScorer(
   2028         batch_size=batch_size,
   2029         num_beams=generation_config.num_beams,
   (...)
   2034         max_length=generation_config.max_length,
   2035     )

File /home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/transformers/generation/utils.py:2961, in GenerationMixin._sample(self, input_ids, logits_processor, stopping_criteria, generation_config, synced_gpus, streamer, **model_kwargs)
   2958 model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
   2960 # forward pass to get next token
-> 2961 outputs = self(**model_inputs, return_dict=True)
   2963 if synced_gpus and this_peer_finished:
   2964     continue  # don't waste resources running the code we don't need

File /home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py:1553, in Module._wrapped_call_impl(self, *args, **kwargs)
   1551     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1552 else:
-> 1553     return self._call_impl(*args, **kwargs)

File /home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py:1562, in Module._call_impl(self, *args, **kwargs)
   1557 # If we don't have any hooks, we want to skip the rest of the logic in
   1558 # this function, and just call forward.
   1559 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1560         or _global_backward_pre_hooks or _global_backward_hooks
   1561         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1562     return forward_call(*args, **kwargs)
   1564 try:
   1565     result = None

TypeError: LlavaForConditionalGeneration.forward() got an unexpected keyword argument 'num_logits_to_keep'
zjysteven commented 2 months ago

Could you try inferencing with the "old model"? Like load the model with old_model instead of model_id? If it raises the same error then it might be a versioning issue of the transformers library (e.g., see https://github.com/huggingface/transformers/issues/29426). If the old model works but finetuned model doesn't then there might be something wrong with lmms-finetune and I will take a look.

binarybeastt commented 1 month ago

it was a versioning issue of the transformers library, I only had to perform an upgrade.