Open 345ishaan opened 1 year ago
Model weights seem to be available as well no? https://github.com/ShoufaChen/DiffusionDet#models
@patrickvonplaten mind if I give this a try? This would be my first time contributing a model, so I might need a hand occasionally.
@patrickvonplaten mind if I give this a try? This would be my first time contributing a model, so I might need a hand occasionally.
Sorry I forgot to assign the issue to myself while opening but was actually planning to look into this.
Go for it! 🤗
Go for it! 🤗
I am happy to collaborate if you want :) I have done it in the past and given I will be only working outside office-hours, things can move faster that way.
Ordinarily I would say yes, but I don't think I can dedicate any time towards it until Tuesday at the earliest. So probably best for you to make a start yourself and if you have anything you want to hand off to me, I can chip in a bit 😅
Also more than happy to help if needed :-)
@patrickvonplaten Plan to run through their code this weekend in inference mode. Let me know if you have a task checklist in your mind which i should be following. Happy to split up if needed.
Sure, I think the following would make sense:
diffusers
a) First make sure weights can be correctly
b) Then check forward passAlso happy to guide you through a PR :-)
I am able to run the original codebase here: https://colab.research.google.com/drive/1rA5SXuTx2pI6o7tWA6Ad5QRZn4a1ajMh#scrollTo=Sn5gWF3fhpf-
@patrickvonplaten This work is tried on types of encoder, CNN based (Resnet Style) and Transformer based (Swin Transformer). Do you prefer transformer based, also is there a HF implementation of Swin Transformer which i should refer?
Think leveraging an existing transformers implementation could make a lot of sense here (also cc @ShoufaChen as the author of diffusionDet :-) )
And maybe @NielsRogge FYI
Yes we do have Swin implemented here -> https://github.com/huggingface/transformers/blob/main/src/transformers/models/swin/modeling_swin.py. So you can do from transformers import SwinModel
Hello everyone,
Thanks for your efforts in integrating DiffusionDet into awesome diffusers.
We provided Swin-Base model here: https://github.com/ShoufaChen/DiffusionDet/releases/download/v0.1/diffdet_coco_swinbase.pth.
Hi, @345ishaan ,
May I ask about your progress on this integration?
I am glad to offer help.
@ShoufaChen Sorry for the delay here. I am running very busy at work because of EOY launches (hopefully over by tomorrow). Last weekend, I was able to run your demo succesfully in a standalone colab. This Fri-Sun, I was planning to look into doing the following:
1) create a diffusion pipeline for DiffusionDet. 2) Preload weights from the SwinTF encoder model into the one under huggingface. 3) Read and understand the detection decoder and try integrating.
Happy to work alongside you here as I guess we can do it much faster with you being involved. Please let me know what suits you. I am definitely interested in pushing it to finish line.
Hi, @345ishaan ,
You can leave the most challenging part to me since I think I am more familiar with DiffusionDet (as the author of this work).
Yes we do have Swin implemented here -> https://github.com/huggingface/transformers/blob/main/src/transformers/models/swin/modeling_swin.py. So you can do
from transformers import SwinModel
@patrickvonplaten @NielsRogge i am guessing transformers and diffusers are maintained as separate libraries. so i branched out the SwinTransformer impln linked above into src/diffusers/models..any suggestions if i should avoid it?
Hey @345ishaan,
We're trying to leverage transformers
as much as possible. So for the image encoding which is based on a SwinModel, please don't add the code to src/diffusers/models
instead:
timm
or transformers
, feel free to just directly important it in the pipeline (e.g. like we do here: https://github.com/huggingface/diffusers/blob/847daf25c7e461795932099c5097eb8ac489645c/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L265)We only put actual diffusion models into src/diffusers
- i.e. models that are called over and over again in the denoising process.
Does this make sense?
Hey @345ishaan,
We're trying to leverage
transformers
as much as possible. So for the image encoding which is based on a SwinModel, please don't add the code tosrc/diffusers/models
instead:
- If the model can be used out of the box from
timm
ortransformers
, feel free to just directly important it in the pipeline (e.g. like we do here: https://github.com/huggingface/diffusers/blob/847daf25c7e461795932099c5097eb8ac489645c/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L265 )- If the model requires some hacks/ tweaks please add it as a file to the pipeline folder as done here: https://github.com/huggingface/diffusers/blob/847daf25c7e461795932099c5097eb8ac489645c/src/diffusers/pipelines/alt_diffusion/modeling_roberta_series.py#L59
We only put actual diffusion models into
src/diffusers
- i.e. models that are called over and over again in the denoising process.Does this make sense?
Thanks for the explaination @patrickvonplaten, will follow what you suggested.
Hi, I wanted to check in on the progress of integrating DiffusionDet into Diffusers. I’m very interested in seeing DiffusionDet become part of Hugging Face’s ecosystem. Is this work still ongoing? Is there any way I could contribute?
Hi @HichTala ,
Thanks for your interest. Happy to see you are interest in this work. As far as I know, it is not on going currently. I'm glad to offer help if you want to contribute to this.
Ok, I’ll give it a try, then! I’ll be sure to reach out if I need any help. Thank you!
Model/Pipeline/Scheduler description
Recent work which leverages diffusion models for object detection task. https://arxiv.org/abs/2211.09788
Add capability to run it through HF diffusers pipeline and if possible also create benchmarks or comparison on datasets like nuScenes.
Open source status
Provide useful links for the implementation
No response