yfzhang114 / LLaVA-Align

This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding strategy.
Apache License 2.0
67 stars 2 forks source link

The same model_kwargs and model_kwargs_cd. #6

Open Stevetich opened 2 months ago

Stevetich commented 2 months ago

Hi. I have noticed that you adopted the code of VCD as your base code. But I found that they use the same model_kwargs and model_kwargs_cd to generate tokens. I am confused because past_key_values term is also incorporated in model_kwargs, which means the same past_key_values term is used in original and distorted images as the visual inputs. Is that operation correct?

yfzhang114 commented 2 months ago

Thanks for pointing that out! I'm not entirely clear on the issue you're describing, though. Can you explain the problem a bit more? I'd really appreciate it if you could help me understand why this operation might be incorrect.