jshilong / GPT4RoI

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Other
506 stars 25 forks source link

where to use #44

Open bibibabibo26 opened 6 months ago

bibibabibo26 commented 6 months ago

hello, in the inference.py you offered in #14, I see the multi-modal input tokens for LLM, it includes bbox token, but I can't find where you replace the bbox token or you use the image feature which got from clip and interpolate. Can you explain it for me? thank you.

bibibabibo26 commented 6 months ago

I find it in SPILlavaLlamaModel