shiimizu / ComfyUI-PhotoMaker-Plus

PhotoMaker for ComfyUI
GNU General Public License v3.0
240 stars 21 forks source link

What's the correct way to load the photomaker lora? #32

Closed deepfree2023 closed 3 months ago

deepfree2023 commented 3 months ago

It's a bit confusing how to load the photomaker lora in the description:

  1. Load the LoRA within the model using the LoraLoaderModelOnly node.
  2. Automatic PhotoMaker LoRA detection and loading in the LoraLoader nodes.

I wonder what's the correct way to load the photomaker lora, 1 or 2 or 1&2?

deepfree2023 commented 3 months ago

And any recommendation on the lora strength value of 1?

shiimizu commented 3 months ago

Technically, the model isn't a LoRA but contains one:

It mainly contains two parts corresponding to two keys in loaded state dict: 1

  1. id_encoder includes finetuned OpenCLIP-ViT-H-14 and a few fuse layers.
  2. lora_weights applies to all attention layers in the UNet, and the rank is set to 64.

I made a hacky way so that the LoraLoader nodes could load the lora inside the model without having you extract it manually beforehand. So it's as simple as downloading the model into the ComfyUI/models/photomaker folder, and that's it.

As for the lora, load one or the other. I think the lora strength should be kept >= 0.5. What's more important is the style strength (as seen in the example workflow ).

deepfree2023 commented 3 months ago

Thank you for the example workflow! 😁

It's clear that the PhotoMaker model should be loaded twice, both in the LoraLoaderModelOnly and PhotoMaker Loader (plus) nodes.

I believe the "prep image for CLIP vision" and "style strength ratio" are no longer necessary in the V2 workflow. InsightFace can support large input images, detect face positions, and extract facial key points. The "prep image for CLIP vision" always scales the input image to 224x224, which results in a significant loss of information. By using high-resolution input images, V2 with a single KSampler (without style strength ratio) can produce much better results than V1. The style strength ratio doesn't seem to offer any further improvement.

deepfree2023 commented 3 months ago

A remaining issue is that the similarity of V2 results seems much lower compared to the official Hugging Face space demo.

klossm commented 3 months ago

剩下的问题是,与官方的 Hugging Face 空间演示相比,V2 结果的相似度似乎要低得多。

I feel the same way , the official Hugging Face space demo seems to be better.

shiimizu commented 3 months ago

Right. So what's more important is the number of high quality reference images and the sdxl checkpoint. The style strength ratio helps with applying styles like line art. I agree that the PrepImage nodes can be omitted.

For example, I used 31 references and I got a good output. That’s probably too much, but the more, the better. I'll upload improved workflows later.

klossm commented 3 months ago

Right. So what's more important is the number of high quality reference images and the sdxl checkpoint. The style strength ratio helps with applying styles like line art. I agree that the PrepImage nodes can be omitted.

For example, I used 31 references and I got a good output. That’s probably too much, but the more, the better. I'll upload improved workflows later.

In this example, I only used the single reference image from the official Hugging Face space demo {I used the default parameters}, which I tried to replicate in comfyui{The RealVisXL-V4-V4-BakedVAE model was used}, But the results were not good.The image above shows the image I used and the result of the output image. The clarity of this reference image isn't even very good.

20240728060916 6153527

For the images output from the official Hugging Face space demo, I used Face Embeds Distance for the calculation, where dist is below 0.4

20240728062555

Then here are the results of photomaker V2 in comfyui,Diminished similarity.I've run it many times and the output image DIST always hovers above 0.5

20240728064752

shiimizu commented 3 months ago

Thanks for the report. There does seem to be an issue. I'm not sure where, but I'll investigate.

EDIT: Using the RepeatImageBatch node on the reference image(s) seems to help.

klossm commented 3 months ago

On top of using the RepeatImageBatch node,Using a sampler combination like deis beta seems to work better,The effect is very similar to the official Hugging Face space demo。

20240728080346