penghao-wu / vstar

PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"
https://vstar-seal.github.io/
MIT License
497 stars 32 forks source link

Expected output of the final answer? #3

Closed kexul closed 8 months ago

kexul commented 8 months ago

Hi, I've asked: what's the color of the content in the cup? with the following image: image

Here is the output: image image

What's the meaning of the values in the final answer?

kexul commented 8 months ago

I've tried several other images, always got frustrated digits in the final answer.

penghao-wu commented 8 months ago

There is a small bug in loading the model previously. Could you please update the repo and try again?

kexul commented 8 months ago

It throws an error after updating:

Traceback (most recent call last):
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/gradio/routes.py", line 437, in run_predict
    output = await app.get_blocks().process_api(
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/gradio/blocks.py", line 1352, in process_api
    result = await self.call_function(
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/gradio/blocks.py", line 1077, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2134, in run_sync_in_worker_thread
    return await future
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 851, in run
    result = context.run(func, *args)
  File "/data/vstar/app.py", line 146, in inference
    prediction = vqa_llm.free_form_inference(image, question, max_new_tokens=512)
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data/vstar/vstar_bench_eval.py", line 91, in free_form_inference
    output_ids = self.model.generate(
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/transformers/generation/utils.py", line 1538, in generate
    return self.greedy_search(
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/transformers/generation/utils.py", line 2362, in greedy_search
    outputs = self(
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/data/vstar/LLaVA/llava/model/language_model/llava_search_llama.py", line 78, in forward
    input_ids, attention_mask, past_key_values, inputs_embeds, labels = self.prepare_inputs_labels_for_multimodal(input_ids, attention_mask, past_key_values, labels, images, object_features, images_long, objects_long)
  File "/data/vstar/LLaVA/llava/model/llava_search_arch.py", line 110, in prepare_inputs_labels_for_multimodal
    image_features_long, image_features_short = self.encode_images(images)
  File "/data/vstar/LLaVA/llava/model/llava_search_arch.py", line 86, in encode_images
    image_features_short = self.get_model().mm_projector_object(image_features)
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/torch/nn/modules/container.py", line 215, in forward
    input = module(input)
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/data/vstar/LLaVA/llava/model/multimodal_projector/perceiver.py", line 114, in forward
    latents = attn(x, latents) + latents
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/data/vstar/LLaVA/llava/model/multimodal_projector/perceiver.py", line 58, in forward
    q = self.to_q(latents)
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 441, in forward
    out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 563, in matmul
    return MatMul8bitLt.apply(A, B, out, bias, state)
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 400, in forward
    C32A, SA = F.transform(CA, "col32")
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/bitsandbytes/functional.py", line 2083, in transform
    if out is None: out, new_state = get_transform_buffer(state[0], A.dtype, A.device, to_order, state[1], transpose)
  File "/home/gymuser/miniforge3/envs/vstar/lib/python3.10/site-packages/bitsandbytes/functional.py", line 461, in get_transform_buffer
    return init_func((rows, cols), dtype=dtype, device=device), state
UnboundLocalError: local variable 'rows' referenced before assignment

Seems to be related with bitsandbytes, because I've enabled load_8bit.

penghao-wu commented 8 months ago

Hi, could you please try to add this in the builder if you are using 8_bit with a single GPU?

kwargs["quantization_config"] = BitsAndBytesConfig(
                    llm_int8_skip_modules=['mm_projector_object'],
                    load_in_8bit=True,
                )
kexul commented 8 months ago

Thanks! Worked like a charm!