shikiw / OPERA

[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
MIT License
244 stars 22 forks source link

CHAIR Reproduction Bugs #20

Closed xing0047 closed 4 months ago

xing0047 commented 4 months ago

Thanks for your great work and open-sourcing your codes!

I am working on reproducing CHAIR results for OPERA (which are presented in Table 2 and 3). There is a SizeMismatch error when running python chair_eval.py over single 4090 GPU. May I ask if there is something wrong with my evaluation scripts? For reference, both evaluation scripts and reported errors are presented as follows (have tried both max_new_tokens=512 and max_new_tokens=64 and encounter the same issue).

evaluation script

python chair_eval.py \
    --model llava-1.5 \
    --data_path ./data/coco/val2014/ \
    --gpu-id 1 --beam 5 --scale_factor 50 \
    --threshold 15 --num_attn_candidates 5 \
    --penalty_weights 1

reported error (happens after a few steps)

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮                                                                                                    
│ /home/xingy/OPERA/chair_eval.py:172 in <module>                                                  │                                                                                                    
│                                                                                                  │                                                                                                    
│   169 │                                                                                          │                                                                                                    
│   170 │   with torch.inference_mode():                                                           │                                                                                                    
│   171 │   │   with torch.no_grad():                                                              │                                                                                                    
│ ❱ 172 │   │   │   out = model.generate(                                                          │                                                                                                    
│   173 │   │   │   │   {"image": norm(image), "prompt":qu},                                       │                                                                                                    
│   174 │   │   │   │   use_nucleus_sampling=args.sample,                                          │                                                                                                    
│   175 │   │   │   │   num_beams=args.beam,                                                       │                                                                                                    
│                                                                                                  │                                                                                                    
│ /home/xingy/anaconda3/envs/opera/lib/python3.9/site-packages/torch/utils/_contextlib.py:115 in   │                                                                                                    
│ decorate_context                                                                                 │                                                                                                    
│                                                                                                  │                                                                                                    
│   112 │   @functools.wraps(func)                                                                 │                                                                                                    
│   113 │   def decorate_context(*args, **kwargs):                                                 │                                                                                                    
│   114 │   │   with ctx_factory():                                                                │                                                                                                    
│ ❱ 115 │   │   │   return func(*args, **kwargs)                                                   │                                                                                                    
│   116 │                                                                                          │                                                                                                    
│   117 │   return decorate_context                                                                │                                                                                                    
│   118                                                                                            │                                                                                                    
│                                                                                                  │                                                                                                    
│ /home/xingy/OPERA/minigpt4/models/llava.py:211 in generate                                       │                                                                                                    
│                                                                                                  │                                                                                                    
│   208 │   │   │   │   │   "response_start": input_ids.shape[1]+NUM_IMAGE_TOKENS-1,               │
│   209 │   │   │   │   }                                                                          │                                                                                                    
│   210 │   │   │                                                                                  │                                                                                                    
│ ❱ 211 │   │   │   output_ids = self.llama_model.generate(                                        │
│   212 │   │   │   │   input_ids=input_ids,                                                       │                                                                                                    
│   213 │   │   │   │   use_cache=True,                                                            │                                                                                                    
│   214 │   │   │   │   do_sample=use_nucleus_sampling,                                            │
│                                                                                                  │                                                                                                    
│ /home/xingy/anaconda3/envs/opera/lib/python3.9/site-packages/torch/utils/_contextlib.py:115 in   │
│ decorate_context                                                                                 │                                                                                                    
│                                                                                                  │                                                                                                    
│   112 │   @functools.wraps(func)                                                                 │                                                                                                    
│   113 │   def decorate_context(*args, **kwargs):                                                 │                                                                                                    
│   114 │   │   with ctx_factory():                                                                │                                                                                                    
│ ❱ 115 │   │   │   return func(*args, **kwargs)                                                   │                                                                                                    
│   116 │                                                                                          │                                                                                                    
│   117 │   return decorate_context                                                                │         
│   118                                                                                            │                                                                                                    
│                                                                                                  │                                                                                                    
│ /home/xingy/OPERA/transformers-4.29.2/src/transformers/generation/utils.py:1649 in generate      │
│                                                                                                  │                                                                                                    
│   1646 │   │   │   │   **model_kwargs,                                                           │                                                                                                    
│   1647 │   │   │   )                                                                             │                                                                                                    
│   1648 │   │   │   # 13. run opera beam search                                                   │                                                                                                    
│ ❱ 1649 │   │   │   return self.opera_beam_search(                                                │
│   1650 │   │   │   │   input_ids,                                                                │                                                                                                    
│   1651 │   │   │   │   beam_scorer,                                                              │                                                                                                    
│   1652 │   │   │   │   logits_processor=logits_processor,                                        │
│                                                                                                  │                                                                                                    
│ /home/xingy/OPERA/transformers-4.29.2/src/transformers/generation/utils.py:3353 in               │
│ opera_beam_search                                                                                │                                                                                                    
│                                                                                                  │                                                                                                    
│   3350 │   │   │   else:                                                                         │                                                                                                    
│   3351 │   │   │   │   assert beam_idx is not None and attn_previous is not None                 │
│   3352 │   │   │   │   attn_previous = torch.cat([attn_previous, torch.zeros_like(attn_previous  │
│ ❱ 3353 │   │   │   │   attn_previous = torch.cat(                                                │
│   3354 │   │   │   │   │   [attn_previous[beam_idx], outputs.attentions[-1].clone().max(1, keep  │
│   3355 │   │   │                                                                                 │                                                                                                    
│   3356 │   │   │   attn_previous = attn_previous.max(1, keepdim=True).values.data # [batch_size  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 645 but got size 667 for tensor number 1 in the list.                                                                                           
shikiw commented 4 months ago

Thanks for your appreciation!

Have you tried to reproduce our POPE results? Will this bug still exist?

xing0047 commented 4 months ago

Thanks for your kind reply.

We can reproduce POPE results. There are no "size mismatch" bugs reported when we reproduce POPE.

shikiw commented 4 months ago

Hi,

Could you please provide the image id that triggered this error? I have checked and rerun the code locally and haven't found any problems.

shikiw commented 4 months ago

This problem might be the Out-of-Memory error triggered at https://github.com/shikiw/OPERA/blob/b83bc6b264a2d7bd4a6163fc21f9bb806d443a0b/transformers-4.29.2/src/transformers/generation/utils.py#L3515-L3520 and then it jumps out "try" to "except" without running the code at (https://github.com/shikiw/OPERA/blob/b83bc6b264a2d7bd4a6163fc21f9bb806d443a0b/transformers-4.29.2/src/transformers/generation/utils.py#L3534-L3552 Thus, model_kwargs["past_key_values"] can not be rolled back successfully.

Solution 1: Try it on the GPU with larger memory Solution 2: Try beam=2 and num_attn_candidates=2.

xing0047 commented 4 months ago

Hi shikiw,

Thanks for replying to this issue.

It appears that problem is not underlying in image inputs. As suggested, I will try these two solutions first and see if there is any problem.

teamwong111 commented 1 month ago

same bug. do you have some process on solving this bug? @xing0047 It seems that attn_previous and output.attentions dimension 2(query dimension) are mismatch. Maybe the rollback causes the bug, I am not sure.