mbzuai-oryx / groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
https://grounding-anything.com
777 stars 37 forks source link

3D implementation of GLaMM #46

Closed remvanthull closed 4 months ago

remvanthull commented 6 months ago

Hi!

I have been experimenting with your model for quite some time now, specifically on medical imaging data.

I am currently working on looking into possibilities of extending your architecture such that it would be able to encode sequences of images and decode these accordingly to obtain 3D segmentations.

I was curious if you maybe have a take on how to tackle this. It would greatly help me, as I am doing my master's thesis on LMMs in medical imaging with your model as the main focus of interest! :)

Thank you in advance, Rachel

mmaaz60 commented 6 months ago

Hi @remvanthull,

Thank you for your interest in our work. Your project looks interesting. A simple approach could be to replace image encoder with something like UNETR++ encoder and segmentation decoder with something like MedSAM.

Please do share any updates that you may have in your project. Thank you and Good Luck!

hanoonaR commented 4 months ago

@remvanthull,

You can checkout 3D-GRAND - Might help in some inspiration in how to extend GLaMM for 3D data.