[REQUEST] Custom trained model to support cryptomatte and depth pass

lllyasviel / ControlNet

Let us control diffusion models!

Apache License 2.0

30.24k stars 2.72k forks source link

[REQUEST] Custom trained model to support cryptomatte and depth pass #82

Open KaruroChori opened 1 year ago

KaruroChori commented 1 year ago

It would be great if we could use cryptomatte and depth passes generated from a rendering engine i.e. blender, and to use their combined information to inform the final "rendering" via controlnet. This would be somewhat similar to a combination of depth and segmentation maps as they are currently implemented.

KaruroChori commented 1 year ago

Edit: this was a reply to a post later removed.

In theory it would be easy to automatically generate images for training. Blender has a good coverage with python APIs and after some initial setup all steps can be automatic. We could prepare a small but diverse set of scenes (there are many made available by the blender foundation). For each of them we need to set up many vectors for the camera to be moved and aligned to. We enable the two passes we are interested in + the normal full rendering, and profit.

Njasa2k commented 1 year ago

Edit: this was a reply to a post later removed.

In theory it would be easy to automatically generate images for training. Blender has a good coverage with python APIs and after some initial setup all steps can be automatic. We could prepare a small but diverse set of scenes (there are many made available by the blender foundation). For each of them we need to set up many vectors for the camera to be moved and aligned to. We enable the two passes we are interested in + the normal full rendering, and profit.

t1 (1)

All we need is 200k of these examples.

KaruroChori commented 1 year ago

512x512? It is feasible, even more so with few cards supporting OptiX. Actually eevee got support for Cryptomatte few years ago, so we could avoid Cycles and speed up the rendering process quite a bit.

The main concern would be tagging the final images.

Njasa2k commented 1 year ago

512x512? It is feasible, even more so with few cards supporting OptiX. Actually eevee got support for Cryptomatte few years ago, so we could avoid Cycles.

The main concern would be tagging the final images.

Automated captioning by BLIP or whatever?

KaruroChori commented 1 year ago

I don't have any experience with it, but its seems good from what I have seen.

toyxyz commented 1 year ago

Wouldn't it be possible to have a more detailed caption? "FF0000: gray car, 00FF00: glass, 0000FF: parking lot"

KaruroChori commented 1 year ago

Basically a material list exported from blender with at least albedo and the material label? I do not have access to my main workstation at the moment, but next week I would like to see what is feasible in this respect. Also, we need to cope with the limitations of the text model used by stable diffusion, and I am not sure this is too easy.

geroldmeisinger commented 1 year ago

also see "double control" discussion here https://github.com/lllyasviel/ControlNet/discussions/30