Where should one put a reference image when using the FineFacePipeline?

AndreiZoltan commented 2 months ago

I'm currently trying to use the FineFacePipeline for generating images based on a prompt and specific AU (Action Units) values. However, there isn't an explicit parameter or example in the documentation that shows where one should place a reference image. Is there support for using a reference image in FineFacePipeline?

johndpope commented 2 months ago

because the finefacepipeline inherits from stablediffusionpipeline - you maybe able to switch in a super class based on this https://github.com/InstantID/InstantID

tvaranka commented 2 months ago

I'm currently trying to use the FineFacePipeline for generating images based on a prompt and specific AU (Action Units) values. However, there isn't an explicit parameter or example in the documentation that shows where one should place a reference image. Is there support for using a reference image in FineFacePipeline?

This is not supported yet in this code base, but I will be adding it later.

The paper uses ip-adapter-face for reference images, but as @johndpope mentioned it can be integrated with InstantID or other similar approaches.

AndreiZoltan commented 2 months ago

thank you

Janspiry commented 1 month ago

@tvaranka Hi, I am trying to integrate ip-adapter-face and Fineface manually. However, I found that the unet.config.cross_attention_dim of SD2.1 is different from that of SDXL/SD1.5, which makes IPA unable to work properly? Do you have any suggestions?

tvaranka commented 1 month ago

@tvaranka Hi, I am trying to integrate ip-adapter-face and Fineface manually. However, I found that the unet.config.cross_attention_dim of SD2.1 is different from that of SDXL/SD1.5, which makes IPA unable to work properly? Do you have any suggestions?

The base model needs to be same for the integration to work. I have the weights for SD1.5 fineface, which I used for the integration with ip-adapter-face. I will be releasing the whole pipeline for fineface+ip-adapter-face, but unfortunately right now I am quite busy, so it will take some time.

If you are still keen on trying to do it by yourself, I can upload the weights of the SD1.5 finafce to huggingface, after which the combination is possible.

Janspiry commented 1 month ago

@tvaranka Hi, I am trying to integrate ip-adapter-face and Fineface manually. However, I found that the unet.config.cross_attention_dim of SD2.1 is different from that of SDXL/SD1.5, which makes IPA unable to work properly? Do you have any suggestions?

The base model needs to be same for the integration to work. I have the weights for SD1.5 fineface, which I used for the integration with ip-adapter-face. I will be releasing the whole pipeline for fineface+ip-adapter-face, but unfortunately right now I am quite busy, so it will take some time.

If you are still keen on trying to do it by yourself, I can upload the weights of the SD1.5 finafce to huggingface, after which the combination is possible.

Thank you for your prompt response. I eagerly anticipate the opportunity to experiment with the integration of the controller and the ID controller. Would you be able to upload the relevant weights?

tvaranka commented 1 month ago

@tvaranka Hi, I am trying to integrate ip-adapter-face and Fineface manually. However, I found that the unet.config.cross_attention_dim of SD2.1 is different from that of SDXL/SD1.5, which makes IPA unable to work properly? Do you have any suggestions?

The base model needs to be same for the integration to work. I have the weights for SD1.5 fineface, which I used for the integration with ip-adapter-face. I will be releasing the whole pipeline for fineface+ip-adapter-face, but unfortunately right now I am quite busy, so it will take some time. If you are still keen on trying to do it by yourself, I can upload the weights of the SD1.5 finafce to huggingface, after which the combination is possible.

Thank you for your prompt response. I eagerly anticipate the opportunity to experiment with the integration of the controller and the ID controller. Would you be able to upload the relevant weights?

I have now uploaded the weights at https://huggingface.co/Tvaranka/fineface/tree/main

This model was done earlier in development and uses the following AU encoder:

class AUEncoder(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.au_encoder = torch.nn.Linear(12, 756)
    def forward(self, x):
        x = x.clone()
        x = torch.cat([x, self.au_encoder(x)], dim=1) 
        x = x.reshape(-1, 1, 768)
        return x

The intensity of AUs will likely have to be increased when using in conjunction with IP-Adapter, due to the additional conditioning signal.

Janspiry commented 1 month ago

@tvaranka Hi, I am trying to integrate ip-adapter-face and Fineface manually. However, I found that the unet.config.cross_attention_dim of SD2.1 is different from that of SDXL/SD1.5, which makes IPA unable to work properly? Do you have any suggestions?

The base model needs to be same for the integration to work. I have the weights for SD1.5 fineface, which I used for the integration with ip-adapter-face. I will be releasing the whole pipeline for fineface+ip-adapter-face, but unfortunately right now I am quite busy, so it will take some time. If you are still keen on trying to do it by yourself, I can upload the weights of the SD1.5 finafce to huggingface, after which the combination is possible.

Thank you for your prompt response. I eagerly anticipate the opportunity to experiment with the integration of the controller and the ID controller. Would you be able to upload the relevant weights?

I have now uploaded the weights at https://huggingface.co/Tvaranka/fineface/tree/main

This model was done earlier in development and uses the following AU encoder:
class AUEncoder(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.au_encoder = torch.nn.Linear(12, 756)
    def forward(self, x):
        x = x.clone()
        x = torch.cat([x, self.au_encoder(x)], dim=1) 
        x = x.reshape(-1, 1, 768)
        return x
The intensity of AUs will likely have to be increased when using in conjunction with IP-Adapter, due to the additional conditioning signal.

Thank you so much, I can't wait to try it

Janspiry commented 1 month ago

Hi, @tvaranka I tried to combine IPA and AU controls, but the results were not satisfactory. IPA worked fine, but adding AU control caused a significant drop in quality, and the face control was not as good as expected (and worse quality if AU was increased). Do you have any suggestions?

tvaranka commented 1 month ago

Thanks a lot for testing it out

Because the training data included only facial images that are mostly frontal facing, the model struggles with cases outside this. Try prompts such as "close-up" and "frontal facing". This applies to the case without ip-adapter as well.
Since the model is now taking two individual modification inputs at the cross-attention (identity and AU), this seems to lead to a degraded performance as the modules are trained independently and both are trying to modify the face. The results I obtained were worse in AUs when combined with ip-adapter, compared to not using it.
Due to the two seperate conditional inputs, the strength of AUs may need to be increased. I found that without ip-adapter a intensity value of 5 would have to be around 8-10 with ip-adapter
The strength of the ip-adapter and AU branch needed to be relatively strong, around 0.8-1.0 each.
The model is still finicky so need to try a few different seeds.

I will try to release my version soon, thanks again for testing it out

Janspiry commented 1 month ago

Thanks a lot for testing it out

Because the training data included only facial images that are mostly frontal facing, the model struggles with cases outside this. Try prompts such as "close-up" and "frontal facing". This applies to the case without ip-adapter as well.

Since the model is now taking two individual modification inputs at the cross-attention (identity and AU), this seems to lead to a degraded performance as the modules are trained independently and both are trying to modify the face. The results I obtained were worse in AUs when combined with ip-adapter, compared to not using it.

Due to the two seperate conditional inputs, the strength of AUs may need to be increased. I found that without ip-adapter a intensity value of 5 would have to be around 8-10 with ip-adapter

The strength of the ip-adapter and AU branch needed to be relatively strong, around 0.8-1.0 each.

The model is still finicky so need to try a few different seeds.

I will try to release my version soon, thanks again for testing it out

Thank you for your response. It is indeed challenging to achieve expression control compatibility with the current frozen IPA controller. I will try the new configuration as you suggested and look forward to the results of the updated version.

tvaranka / fineface

Where should one put a reference image when using the FineFacePipeline? #2