ttchengab / zest_code

[ECCV-2024] This is the official implementation of ZeST.
https://ttchengab.github.io/zest
MIT License
367 stars 22 forks source link

will support comfy ? #2

Open kelisiya opened 7 months ago

drphero commented 7 months ago

@kelisiya I've put together a ComfyUI workflow with existing nodes, based on the techniques described in the paper (IPAdapter, depth ControlNet, grayscale subject). I am able to get some decent results, but not for everything. And it requires a positive prompt briefly describing the texture/material and object. I have no clue how the creators were able to do it without prompts.

Results using sample materials from the paper:

comparison

Workflow: textureTransfer.json

When using SDXL, I recommend using the Tencent depth model, since several others that I tried didn't work at all.

mp3pintyo commented 7 months ago

Thanks for sharing the workflow. The ComfyUI method gave me much better results than the Zero-shot. In the case of the zero-shot solution, the edge of the object looks terrible.

pumpkin-glass-cup pumpkin-glass-cup-comfyui

drphero commented 7 months ago

@kelisiya Well the workflow is basically implementing what they described in the paper and because it's in ComfyUI, it allows for a lot more fine tuning of settings. It's interesting that you liked the result of the glass cup because that is one example I wasn't able to get a good result on in comfy. Your example does have good edges, but the coloring and texture don't really look like the glass cup.

@ttchengab Maybe you could weigh in on how to improve the ComfyUI workflow? For example, how to remove the need for prompting?

drphero commented 7 months ago

@kelisiya After playing around a little more, I was able to get this with the standard Realistic Vision V5.1 model

ComfyUI_temp_uzpzt_00063_

mp3pintyo commented 7 months ago

@kelisiya After playing around a little more, I was able to get this with the standard Realistic Vision V5.1 model

Pretty good! Is the comfyui workflow the same?

drphero commented 7 months ago

Pretty good! Is the comfyui workflow the same?

Yeah, only the prompts were changed I think. The workflow should be embedded in the image.

ttchengab commented 7 months ago

@kelisiya Well the workflow is basically implementing what they described in the paper and because it's in ComfyUI, it allows for a lot more fine tuning of settings. It's interesting that you liked the result of the glass cup because that is one example I wasn't able to get a good result on in comfy. Your example does have good edges, but the coloring and texture don't really look like the glass cup.

@ttchengab Maybe you could weigh in on how to improve the ComfyUI workflow? For example, how to remove the need for prompting?

Hi @drphero, I am wondering what types of prompts did you add (both positive and negative)? Also, what are the strengths you set for the controlnet/IP-Adaptor? When I was experimenting with different prompts, they tend to have minimal effects (the IP-Adaptor seems to already overwrite this). Therefore, the only positive prompts that tend to be helpful is "best quality, high quality". For the rougher edges from the code I am suspecting it could be due to inaccuracy of DPT in predicting depths, though I am not entirely sure.

Also some suggestions (that may be helpful in trying other objects): the foreground grayscale image brightness can also be tuned for better results. Lift the brightness of the grayscale foreground up for darker images and vice-versa.

drphero commented 7 months ago

Hi @drphero, I am wondering what types of prompts did you add (both positive and negative)? Also, what are the strengths you set for the controlnet/IP-Adaptor? When I was experimenting with different prompts, they tend to have minimal effects (the IP-Adaptor seems to already overwrite this). Therefore, the only positive prompts that tend to be helpful is "best quality, high quality". For the rougher edges from the code I am suspecting it could be due to inaccuracy of DPT in predicting depths, though I am not entirely sure.

Also some suggestions (that may be helpful in trying other objects): the foreground grayscale image brightness can also be tuned for better results. Lift the brightness of the grayscale foreground up for darker images and vice-versa.

For some materials, like the glazed mug, I found that I could get good results without special prompting, but for others like the copper plate, the pumpkin would become warped without adding "brushed copper pumpkin" to "best quality, highly detailed, masterpiece, 4k". This happened with SD1.5 models. For SDXL, the brushed aspect of the copper wasn't transferring without prompting it.

In general, I got better results with SD1.5 models, especially Realistic Vision, both inpainting and normal versions. For IP Adapter, the best results seem to come from a weight of 1 and weight type of "style transfer" for the materials I tried. ControlNet with a weight of 0.7 and end percent of 0.5 usually works well for SD1.5. I had a harder time with SDXL because the ControlNet models available for it are quite lackluster in comparison. Only the Tencent depth model really worked at all and with a weight of 0.7 and end percent of 1.0.

In order to deal with the rougher edges, used a GrowMaskWithBlur node to slightly enlarge the masked area and blur the edges a bit.

If you save the glass pumpkin from above and drag it into ComfyUI, you can see all the settings that were used for it.