Open ChanBong opened 1 year ago
Hi @ChanBong, thanks for opening the issue!
You can expect to see an X-Decoder PR is the next two weeks :)
Hi @alaradirik, can we please collaborate in adding this model?
Hi @atharvakavitkar, the PR is almost done but won't include the referring image editing task, which require integration with Stable Diffusion inpainting. Perhaps you could create a tutorial or demo for this task?
Hi @alaradirik, thank you for reaching out to me. I must admit that I have not yet added a model to HuggingFace. But I really want to learn how to do it. Would creating this tutorial be the right step? Or should I search for a simpler model to implement from scratch?
Model description
X-Decoder is a generalized decoding pipeline that can predict pixel-level segmentation and language tokens seamlessly. X-Decoder is the first work that provides a unified way to support all types of image segmentation and a variety of vision-language (VL) tasks.
The model exhibits strong transferability to a wide range of downstream tasks in both zero-shot and fine-tuning settings, achieving state-of-the-art open-vocabulary segmentation and referring segmentation on 10 settings of 7 datasets and should be a valuable addition to transformers library
Open source status
Provide useful links for the implementation
Paper: https://arxiv.org/pdf/2212.11270.pdf Code: https://github.com/microsoft/X-Decoder Weights: https://huggingface.co/spaces/xdecoder/Demo/blob/main/xdecoder_focalt_last.pt
Author: @eltociear Cc: @NielsRogge @alaradirik