Open lllyasviel opened 1 year ago
I guess pose + single source image control would be useful, at least for anime. Although a custom character dreambooth model with https://github.com/lllyasviel/ControlNet/discussions/12 seems to work, single image pose shifting is really attractive to me.
1) depth + segmentation? for example I would like to render movie scene 2) t-1 rendered frame and t+1 keyframe frame? When you want to render movies in anime style and want temporal stability in output. When I am trying just naive pixel img2img, each output frame is slightly different and it looks quite noisy
Take a look on my video made with instruct pix2pix ... https://www.reddit.com/r/StableDiffusion/comments/10x4fkr/pip2pix_marble_terminator/?utm_source=share&utm_medium=web2x&context=3
3) novel view synthesis? Given one, two or more images of an object, generate a new view of the same object. For example I have generated sneakers image and now I want to generate new views to be able to manufacture it. Example: https://thissneakerdoesnotexist.com/3d-info/
Is this simply concatenating additional input channels on the hint image, or actually combining two separately trained control networks?
i would see both as extremely useful.
Potentially a naive question, but I'm wondering about using vector inputs like FaceNet/CLIP/etc embeddings as a second control, rather than spatial inputs like depth/edges/etc?
Discussed in https://github.com/lllyasviel/ControlNet/discussions/30
This is a re-post. Please go to disscussion for disscussion.