Closed pipilurj closed 1 year ago
Hi @pipilurj, thank you for your question. We also tried to only train a linear layer to align the features as MiniGPT-4 does. And the model is also able to produce reasonable results. So we assume that the off-the-shelf Q-Former may not be that important. If you only have limited computation resources, I suggest you can just train a new linear layer and let keep the other parameters frozen.
Thats great, Thanks!
Dear authors, Hello! This is a very interesting work! Just out of curiosity, I wonder whether you have tried training only the linear layer to align the modalities rather than using Lora, as in MiniGPT4 and DetGPT, does it still work? I suppose there may be some difficulties during alignment in this way, since there isn't an off-the-shelf Q-Former that can be used with ImageBind features. Thank you very much!