Closed deepfree2023 closed 3 months ago
And any recommendation on the lora strength value of 1?
Technically, the model isn't a LoRA but contains one:
It mainly contains two parts corresponding to two keys in loaded state dict: 1
id_encoder
includes finetuned OpenCLIP-ViT-H-14 and a few fuse layers.lora_weights
applies to all attention layers in the UNet, and the rank is set to 64.
I made a hacky way so that the LoraLoader nodes could load the lora inside the model without having you extract it manually beforehand. So it's as simple as downloading the model into the ComfyUI/models/photomaker
folder, and that's it.
As for the lora, load one or the other. I think the lora strength should be kept >= 0.5. What's more important is the style strength (as seen in the example workflow ).
Thank you for the example workflow! 😁
It's clear that the PhotoMaker model should be loaded twice, both in the LoraLoaderModelOnly and PhotoMaker Loader (plus) nodes.
I believe the "prep image for CLIP vision" and "style strength ratio" are no longer necessary in the V2 workflow. InsightFace can support large input images, detect face positions, and extract facial key points. The "prep image for CLIP vision" always scales the input image to 224x224, which results in a significant loss of information. By using high-resolution input images, V2 with a single KSampler (without style strength ratio) can produce much better results than V1. The style strength ratio doesn't seem to offer any further improvement.
A remaining issue is that the similarity of V2 results seems much lower compared to the official Hugging Face space demo.
剩下的问题是,与官方的 Hugging Face 空间演示相比,V2 结果的相似度似乎要低得多。
I feel the same way , the official Hugging Face space demo seems to be better.
Right. So what's more important is the number of high quality reference images and the sdxl checkpoint. The style strength ratio helps with applying styles like line art. I agree that the PrepImage nodes can be omitted.
For example, I used 31 references and I got a good output. That’s probably too much, but the more, the better. I'll upload improved workflows later.
Right. So what's more important is the number of high quality reference images and the sdxl checkpoint. The style strength ratio helps with applying styles like line art. I agree that the PrepImage nodes can be omitted.
For example, I used 31 references and I got a good output. That’s probably too much, but the more, the better. I'll upload improved workflows later.
In this example, I only used the single reference image from the official Hugging Face space demo {I used the default parameters}, which I tried to replicate in comfyui{The RealVisXL-V4-V4-BakedVAE model was used}, But the results were not good.The image above shows the image I used and the result of the output image. The clarity of this reference image isn't even very good.
For the images output from the official Hugging Face space demo, I used Face Embeds Distance for the calculation, where dist is below 0.4
Then here are the results of photomaker V2 in comfyui,Diminished similarity.I've run it many times and the output image DIST always hovers above 0.5
Thanks for the report. There does seem to be an issue. I'm not sure where, but I'll investigate.
EDIT: Using the RepeatImageBatch
node on the reference image(s) seems to help.
On top of using the RepeatImageBatch node,Using a sampler combination like deis beta seems to work better,The effect is very similar to the official Hugging Face space demo。
It's a bit confusing how to load the photomaker lora in the description:
I wonder what's the correct way to load the photomaker lora, 1 or 2 or 1&2?