Closed FrsECM closed 11 months ago
@FrsECM Would you consider adding the pipeline to the community pipelines?
I can but I’ll certainly need some help and insights to review/implement it !
De : Dhruv Nair @.> Envoyé : Monday, October 9, 2023 11:45:08 AM À : huggingface/diffusers @.> Cc : François Ponchon @.>; Mention @.> Objet : Re: [huggingface/diffusers] SemanticMasks to Image - Diffusion Model (Issue #5321)
@FrsECMhttps://github.com/FrsECM Would you consider adding the pipeline to the community pipelines?
— Reply to this email directly, view it on GitHubhttps://github.com/huggingface/diffusers/issues/5321#issuecomment-1752674213, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGG5F7HBF4H4G4WB3ODNZITX6PBSJAVCNFSM6AAAAAA5V755XKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONJSGY3TIMRRGM. You are receiving this because you were mentioned.Message ID: @.***>
@FrsECM Of course. Please open a PR and we can review.
@DN6, I created the pull request there : https://github.com/huggingface/diffusers/pull/5431
Fill free to review it. I need support especially there :
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hi Everyone !
Is your feature request related to a problem? Please describe. I've read the documentation and i saw that there is theses task implemented : Unconditional Image Generation Text-Guided Image Generation Text-Guided Image-to-Image Translation Text-Guided Image-Inpainting Text-Guided Depth-to-Image Translation
In order to improve semantic segmentation pipelines, i'd like to generate synthetic data from semantic masks in order to reduce the number of edge cases in our supervised datasets. (move foregrounds on different background etc....) We wouldn't be in open vocabulary.
Describe the solution you'd like For that, i'd like to condition the generation with a semantic mask, like in this paper : https://arxiv.org/abs/2207.00050
Describe alternatives you've considered I tried to use open vocabulary text to image generation but there is two main issues :
Additional context New in Hugging face, i'd need some insights in order to get started.
Do somebody already implemented such things within the framework ?
Intuitively i'd say that : I need to train a custom VAE in order to fit with my own images (very specific 2 channels images) I need to modify the unet conditionning in order to take as input the semantic mask.
For VAE : It there already a training pipeline for that ?
For UNet : How to condition it with an image instead of a text embedding ?
For pipeline : Maybe implement a StableDiffusionSem2ImgSPipeline Because StableDiffusionImg2ImgPipeline requires on a textencoder/tokenizer that are not necessary here.
Thanks for your insights !