Open alelordelo opened 1 year ago
Great idea. We will try it, or the pull request from your side is also very welcomed.
This would be wild!
Great idea. We will try it, or the pull request from your side is also very welcomed.
Would totally contribute if I could, but I come from Swift/iOS world, and super limited with Python : /
Hi again @gasvn!
I saw you added this: Generate semantic labels for each SAM mask. python sam2semantic.py
would it be possible to train the ControlNet with those segmentation labels for each prompt?
That's possible. We are working on it. For now, you can try our new gradio demo. It combines the inpainting and edit anything, so it can achieve most of the editing ability on a part under the guidance of text prompt. https://huggingface.co/spaces/shgao/EditAnything
thanks @gasvn , just tested your demo, super cool! looking forward to test multi prompt train/inference! ; )
Hi @gasvn, any news on the segmented training? : )
Hi @gasvn, any news on the segmented training? : ) There is a concern about segmented training. I am afraid that lacking training data would makes the model collapse. So the segment with text prompt would be an important issue. For now, I am using blip2 generated text prompt. But I am not sure if this is suitable for stable diffusion. Any suggestions? Thanks~
I think the dataset would be:
Input image Segmented masks Prompt for each segmented mask (either manual label or automatically generated by OpenCLIP, Blip, etc)
then you have both text, image and segmentation as conditioning.
I have a dataset that like this that I could test. Do you see how I could test to train a model with this kind of setup?
Hi @gasvn, I did some research on this...
Abut "segment with text prompt" we could do a test with JSON COCO dataset: https://cocodataset.org/#home
I want to give this. shot, but not sure if its currently possible to train with JSON -> image pairs?
training with text JSON -> image pairs is possible. I think it's needed to slightly change the controlnet to make each segment region has a unique text prompt instead of using just global text prompt.
hi @gasvn , any plans for multiple prompt per mask segment ?
Hi again! : )
is it possible to train with a prompt for each segmented mask region?
Ex: Input image: house
Segmentation mask:
Prompts:
That would open up a lot of possibilities!