huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.36k stars 27.09k forks source link

MaskCLIP #26781

Open sushmanthreddy opened 1 year ago

sushmanthreddy commented 1 year ago

Model description

MaskCLIP represents a transformative step in the realm of open-vocabulary universal image segmentation. Built upon the robust foundation of pre-trained CLIP models, it negates the need for additional finetuning or distillation. The core of MaskCLIP is its innovative Transformer-based MaskCLIP Visual Encoder. This encoder is meticulously designed to integrate mask tokens with a pre-trained ViT CLIP model, making it adept at both semantic and instance segmentation, as well as class prediction. One of MaskCLIP's standout features is its ability to efficiently harness the power of pre-trained dense and local CLIP features within its Visual Encoder. This design choice not only streamlines the segmentation process but also sidesteps the traditionally lengthy student-teacher training phase. Demonstrating its prowess, MaskCLIP has consistently outperformed existing methods on renowned datasets like ADE20K and PASCAL, especially in tasks of semantic, instance, and panoptic segmentation.

Open source status

Provide useful links for the implementation

project page : https://maskclip.github.io github : https://github.com/mlpc-ucsd/MaskCLIP

sushmanthreddy commented 1 year ago

I would like to work on this issue .

eslambakr commented 2 months ago

Hi @sushmanthreddy Any updates regarding integrating it?