mbzuai-oryx / groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
https://grounding-anything.com
758 stars 37 forks source link

Potential Data Leakage? #74

Open joshmyersdean opened 2 days ago

joshmyersdean commented 2 days ago

Hi again @hanoonaR! (cc: @mmaaz60)

I am going through the code base and I noticed that the caption for GCG gets added to the prompt (on this line) but then is never actually removed from the input ids in the collate_fn? Could you please provide some clarification on what is happening in this scenario?

Thank you! Josh

joshmyersdean commented 2 days ago

It also looks like in the demo that nothing gets appended to the assistant part of the prompt. When I do append something it gets repeated. I have also confirmed that the model repeats the answer part when generating text. However, this does not appear to be done during testing which is good!