MaskCLIP - Githubissues

sushmanthreddy commented 1 year ago

Model description

MaskCLIP represents a transformative step in the realm of open-vocabulary universal image segmentation. Built upon the robust foundation of pre-trained CLIP models, it negates the need for additional finetuning or distillation. The core of MaskCLIP is its innovative Transformer-based MaskCLIP Visual Encoder. This encoder is meticulously designed to integrate mask tokens with a pre-trained ViT CLIP model, making it adept at both semantic and instance segmentation, as well as class prediction. One of MaskCLIP's standout features is its ability to efficiently harness the power of pre-trained dense and local CLIP features within its Visual Encoder. This design choice not only streamlines the segmentation process but also sidesteps the traditionally lengthy student-teacher training phase. Demonstrating its prowess, MaskCLIP has consistently outperformed existing methods on renowned datasets like ADE20K and PASCAL, especially in tasks of semantic, instance, and panoptic segmentation.

Open source status

[X] The model implementation is available
[X] The model weights are available

Provide useful links for the implementation

project page : https://maskclip.github.io github : https://github.com/mlpc-ucsd/MaskCLIP

sushmanthreddy commented 1 year ago

I would like to work on this issue .

eslambakr commented 2 months ago

Hi @sushmanthreddy Any updates regarding integrating it?

huggingface / transformers

MaskCLIP #26781

Model description

Open source status

Provide useful links for the implementation