Open 345ishaan opened 2 years ago
Hey @345ishaan,
Thanks a lot for this new model description (adding a label now) . Do you know if the authors released the weights by any chance?
Hey @345ishaan,
Thanks a lot for this new model description (adding a label now) . Do you know if the authors released the weights by any chance?
I am not able to find author's implementation yet. The code and model for pix2seq which is used as pretrained model is there though.
Model/Pipeline/Scheduler description
This work (https://arxiv.org/pdf/2210.06366.pdf) presents how we can apply the advances of diffusion modelling to generate panoptic masks for images and videos conditioned on any image input. In most works related to diffusion modelling, the noise and output space is parametrized in continuous space, however to solve the panoptic task, they bring in the concept of analog bits which allows to use same parameterization but still output discrete instance labels per pixel.
Also, authors have build this work on their previous approach to model object detection as token generation task(https://ai.googleblog.com/2022/04/pix2seq-new-language-interface-for.html)
In all, a very cool work which i feel has a potential when grounded with other modalites can provide better few shot performace on the perception tasks. It will be nice to see if we can leverage interesting features from HF to reproduce this work and also if possible to present recent of AV datasets like nuscenes or WOD.
Open source status
Provide useful links for the implementation
No response