microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
19.62k stars 2.5k forks source link

[Kosmos-2] Will depth information be incorporated in the future? #1251

Open quantingxie opened 1 year ago

quantingxie commented 1 year ago

Hi,

I wonder has anyone successfully incorporated Kosmos-2 with depth information? Will this be the future goal to make the model gain more spatial awareness?

Model I am using Kosmos-2

pengzhiliang commented 1 year ago

Hi, @quantingxie, Thanks for the attention! It sounds like a promising goal, but unfortunately we have not achieved it yet. If you need help, we're happy to provide it.

quantingxie commented 1 year ago

Hi Zhiliang, It would be great if you guys could provide a Kosmos-2 model that can also output depth information in addition to bounding boxes! Looking forward to it! If you have any progress, please let me know, thanks!