SemanticMasks to Image - Diffusion Model

FrsECM commented 1 year ago

Hi Everyone !

Is your feature request related to a problem? Please describe. I've read the documentation and i saw that there is theses task implemented : Unconditional Image Generation Text-Guided Image Generation Text-Guided Image-to-Image Translation Text-Guided Image-Inpainting Text-Guided Depth-to-Image Translation

In order to improve semantic segmentation pipelines, i'd like to generate synthetic data from semantic masks in order to reduce the number of edge cases in our supervised datasets. (move foregrounds on different background etc....) We wouldn't be in open vocabulary.

Describe the solution you'd like For that, i'd like to condition the generation with a semantic mask, like in this paper : https://arxiv.org/abs/2207.00050

Describe alternatives you've considered I tried to use open vocabulary text to image generation but there is two main issues :

The CLIP model is badly adapted to very specific issues that can face private companies / industrials then images are rarely releavant.
We need to retrain VAE because sometime the input distribution of images is very far from open worlds one.

Additional context New in Hugging face, i'd need some insights in order to get started.

Do somebody already implemented such things within the framework ?

Intuitively i'd say that : I need to train a custom VAE in order to fit with my own images (very specific 2 channels images) I need to modify the unet conditionning in order to take as input the semantic mask.

For VAE : It there already a training pipeline for that ?

For UNet : How to condition it with an image instead of a text embedding ?

For pipeline : Maybe implement a StableDiffusionSem2ImgSPipeline Because StableDiffusionImg2ImgPipeline requires on a textencoder/tokenizer that are not necessary here.

Thanks for your insights !

DN6 commented 1 year ago

@FrsECM Would you consider adding the pipeline to the community pipelines?

FrsECM commented 1 year ago

I can but I’ll certainly need some help and insights to review/implement it !

De : Dhruv Nair @.> Envoyé : Monday, October 9, 2023 11:45:08 AM À : huggingface/diffusers @.> Cc : François Ponchon @.>; Mention @.> Objet : Re: [huggingface/diffusers] SemanticMasks to Image - Diffusion Model (Issue #5321)

@FrsECMhttps://github.com/FrsECM Would you consider adding the pipeline to the community pipelines?

— Reply to this email directly, view it on GitHubhttps://github.com/huggingface/diffusers/issues/5321#issuecomment-1752674213, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGG5F7HBF4H4G4WB3ODNZITX6PBSJAVCNFSM6AAAAAA5V755XKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONJSGY3TIMRRGM. You are receiving this because you were mentioned.Message ID: @.***>

DN6 commented 1 year ago

@FrsECM Of course. Please open a PR and we can review.

FrsECM commented 1 year ago

@DN6, I created the pull request there : https://github.com/huggingface/diffusers/pull/5431

Fill free to review it. I need support especially there :

https://github.com/FrsECM/diffusers/blob/8bd3a3fe3c9b0679ec0fc6078befc024c636aad4/examples/semantic_image_synthesis/train_sis.py#L288-L303

github-actions[bot] commented 12 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / diffusers

SemanticMasks to Image - Diffusion Model #5321