Open whiterose199187 opened 1 year ago
Hello,
I see ip-adapter-full-face_sd15.bin has been recently released. Could you explain what is the difference between this and previously released version of IP-Adapter-Face? Also, is this just for SD 1.5 or can work with SDXL too?
Thanks
IP-Adapter should be universal, not limited to human faces, for example, it can be used for clothing https://github.com/tencent-ailab/IP-Adapter/pull/135#issuecomment-1803437109
can it be used with SDXL ?
can it be used with SDXL ?
not available
thanks for the quick response, are there any plans to release SDXL version in the future?
用 ip-adapter-full-face_sd15.pth 替换了ip-adapter-plus-face_sd15.pth 在webui里,会报错,有什么头绪吗?(之前是直接把bin重命名为pth可以用)
用 ip-adapter-full-face_sd15.pth 替换了ip-adapter-plus-face_sd15.pth 在webui里,会报错,有什么头绪吗?(之前是直接把bin重命名为pth可以用)
it current not support
Hello, I see ip-adapter-full-face_sd15.bin has been recently released. Could you explain what is the difference between this and previously released version of IP-Adapter-Face? Also, is this just for SD 1.5 or can work with SDXL too? Thanks
- data: we remove some small faces and do some crop augmentions.
- data preprocessing: we segment the face and remove background.
- model: we use full tokes (256 patch tokens + 1 cls tokens) and use a simple MLP to get face features.
IP-Adapter should be universal, not limited to human faces, for example, it can be used for clothing #135 (comment)
@xiaohu2015 Hello, I have two questions for these modifications.
Thanks!
@eezywu (1) no, we only remove the background. but I also trained a model with only conditioned on segmented face (no fair), it can also works well. (2) the new version will always get better results (we use face id similarity to evaluate)
@eezywu (1) no, we only remove the background. but I also trained a model with only conditioned on segmented face (no fair), it can also works well. (2) the new version will always get better results (we use face id similarity to evaluate)
hi, I saw the generation setting of plus-face with non-square size, i.e., height 704 and width 512, did you train the model with this output size or still use 512x512.
@eezywu (1) no, we only remove the background. but I also trained a model with only conditioned on segmented face (no fair), it can also works well. (2) the new version will always get better results (we use face id similarity to evaluate)
hi, I saw the generation setting of plus-face with non-square size, i.e., height 704 and width 512, did you train the model with this output size or still use 512x512.
I trained the model on sd 1.5 with fixed 512x512 resolution. But if the base UNet model can generate non-square size images, the model also works well. By the way, I also tried to finetune the model with multi-scale, but it no improvement
@eezywu (1) no, we only remove the background. but I also trained a model with only conditioned on segmented face (no fair), it can also works well. (2) the new version will always get better results (we use face id similarity to evaluate)
hi, I saw the generation setting of plus-face with non-square size, i.e., height 704 and width 512, did you train the model with this output size or still use 512x512.
I trained the model on sd 1.5 with fixed 512x512 resolution. But if the base UNet model can generate non-square size images, the model also works well. By the way, I also tried to finetune the model with multi-scale, but it no improvement
Got it, thanks.
@eezywu (1) no, we only remove the background. but I also trained a model with only conditioned on segmented face (no fair), it can also works well. (2) the new version will always get better results (we use face id similarity to evaluate)
got it, thanks for your reply :)
@xiaohu2015 Can you share how was "Use full tokes and use a simple MLP to get face features." achieved? Is this part not open-sourced yet? My tests have found that full-face performs much better than plus-face. and then I tried training ip-adapter-face with my own data, cutting out the face and background, and it indeed works better than the general full-face approach. However, I would like to try modifying the model like you did to further improve the results. Thx
@xiaohu2015 Can you share how was "Use full tokes and use a simple MLP to get face features." achieved? Is this part not open-sourced yet? My tests have found that full-face performs much better than plus-face. and then I tried training ip-adapter-face with my own data, cutting out the face and background, and it indeed works better than the general full-face approach. However, I would like to try modifying the model like you did to further improve the results. Thx
the traning code just same as https://github.com/tencent-ailab/IP-Adapter/blob/main/tutorial_train_plus.py. Only two changes: (1) conditioned image is face image (2) ImageProj switch to a MLP
I would like to ask additionally, for faces, how large an area do we need to obtain, and which face detection model do you recommend? @xiaohu2015
I would like to ask additionally, for faces, how large an area do we need to obtain, and which face detection model do you recommend? @xiaohu2015
Hello, I see ip-adapter-full-face_sd15.bin has been recently released. Could you explain what is the difference between this and previously released version of IP-Adapter-Face? Also, is this just for SD 1.5 or can work with SDXL too? Thanks
- data: we remove some small faces and do some crop augmentions.
- data preprocessing: we segment the face and remove background.
- model: we use full tokes (256 patch tokens + 1 cls tokens) and use a simple MLP to get face features.
IP-Adapter should be universal, not limited to human faces, for example, it can be used for clothing #135 (comment)
I have a question that how you define the prompt when training the face model? Do you use a detailed prompt of the target image or just simple prompt like 'a person'? I found that the generation quality in inference degrades when we use some detailed prompts.
Hello, I see ip-adapter-full-face_sd15.bin has been recently released. Could you explain what is the difference between this and previously released version of IP-Adapter-Face? Also, is this just for SD 1.5 or can work with SDXL too? Thanks
- data: we remove some small faces and do some crop augmentions.
- data preprocessing: we segment the face and remove background.
- model: we use full tokes (256 patch tokens + 1 cls tokens) and use a simple MLP to get face features.
IP-Adapter should be universal, not limited to human faces, for example, it can be used for clothing #135 (comment)
I have a question that how you define the prompt when training the face model? Do you use a detailed prompt of the target image or just simple prompt like 'a person'? I found that the generation quality in inference degrades when we use some detailed prompts.
I use detailed prompt.
Is it possible to use IP Adapter face embeddings for checking similiary between two faces?
I attempted to replicate the training process of the ip-adapter-full-face, but I encountered failure. For training data, I utilized 750,000 pairs of image-text data selected from LAION-face. I performed facial cropping and alignment operations on these 750,000 facial images, resulting in 224x224-sized facial images. I used the cropped facial images as input for CLIP, with the corresponding text descriptions from the laion-face dataset as textual conditions. The training goal was to reconstruct the original images.
I set the training steps to 1 million, saved results every 10,000 steps, the SD model is 1.5,and trained on a machine with 8 V100 32GB GPUs. However, when testing with scale=1.0, I found that the generated model cannot effectively maintain identity. Additionally, there is a noticeable change in facial structure with each 10,000 steps, and the generated faces differ significantly from the input.I wonder if I miss some training details.
My training code is based on the tutorial_train_plus with only the following modifications:
clip_image = self.clip_image_processor(images=raw_image, return_tensors="pt").pixel_values
face_image_file = item["face_image_file"]
face_image = Image.open(os.path.join(self.image_root_path, face_image_file))
clip_image = self.clip_image_processor(images=face_image, return_tensors="pt").pixel_values
image_proj_model = Resampler( dim=unet.config.cross_attention_dim, depth=4, dim_head=64, heads=12, num_queries=args.num_tokens, embedding_dim=image_encoder.config.hidden_size, output_dim=unet.config.cross_attention_dim, ff_mult=4 )
image_proj_model=MLPProjModel(cross_attention_dim=unet.config.cross_attention_dim, clip_embeddings_dim=image_encoder.config.hidden_size)
Hello, I see ip-adapter-full-face_sd15.bin has been recently released. Could you explain what is the difference between this and previously released version of IP-Adapter-Face? Also, is this just for SD 1.5 or can work with SDXL too? Thanks
- data: we remove some small faces and do some crop augmentions.
- data preprocessing: we segment the face and remove background.
- model: we use full tokes (256 patch tokens + 1 cls tokens) and use a simple MLP to get face features.
IP-Adapter should be universal, not limited to human faces, for example, it can be used for clothing #135 (comment)
I have a question that how you define the prompt when training the face model? Do you use a detailed prompt of the target image or just simple prompt like 'a person'? I found that the generation quality in inference degrades when we use some detailed prompts.
I use detailed prompt.
I attempted to replicate the training process of the ip-adapter-full-face, but I encountered failure. For training data, I utilized 750,000 pairs of image-text data selected from LAION-face. I performed facial cropping and alignment operations on these 750,000 facial images, resulting in 224x224-sized facial images. I used the cropped facial images as input for CLIP, with the corresponding text descriptions from the laion-face dataset as textual conditions. The training goal was to reconstruct the original images.
I set the training steps to 1 million, saved results every 10,000 steps, the SD model is 1.5,and trained on a machine with 8 V100 32GB GPUs. However, when testing with scale=1.0, I found that the generated model cannot effectively maintain identity. Additionally, there is a noticeable change in facial structure with each 10,000 steps, and the generated faces differ significantly from the input.I wonder if I miss some training details.
My training code is based on the tutorial_train_plus with only the following modifications: #ip-adapter-plus
clip_image = self.clip_image_processor(images=raw_image, return_tensors="pt").pixel_values
ip-adapter-full
face_image_file = item["face_image_file"]
face_image = Image.open(os.path.join(self.image_root_path, face_image_file))
clip_image = self.clip_image_processor(images=face_image, return_tensors="pt").pixel_values
ip-adapter-plus
image_proj_model = Resampler( dim=unet.config.cross_attention_dim, depth=4, dim_head=64, heads=12, num_queries=args.num_tokens, embedding_dim=image_encoder.config.hidden_size, output_dim=unet.config.cross_attention_dim, ff_mult=4 )
ip-adapter-full
image_proj_model=MLPProjModel(cross_attention_dim=unet.config.cross_attention_dim, clip_embeddings_dim=image_encoder.config.hidden_size)
Hello, I see ip-adapter-full-face_sd15.bin has been recently released. Could you explain what is the difference between this and previously released version of IP-Adapter-Face? Also, is this just for SD 1.5 or can work with SDXL too? Thanks
- data: we remove some small faces and do some crop augmentions.
- data preprocessing: we segment the face and remove background.
- model: we use full tokes (256 patch tokens + 1 cls tokens) and use a simple MLP to get face features.
IP-Adapter should be universal, not limited to human faces, for example, it can be used for clothing #135 (comment)
I have a question that how you define the prompt when training the face model? Do you use a detailed prompt of the target image or just simple prompt like 'a person'? I found that the generation quality in inference degrades when we use some detailed prompts.
I use detailed prompt.
do you tested my face demo? if it performs better than my model?
I attempted to replicate the training process of the ip-adapter-full-face, but I encountered failure. For training data, I utilized 750,000 pairs of image-text data selected from LAION-face. I performed facial cropping and alignment operations on these 750,000 facial images, resulting in 224x224-sized facial images. I used the cropped facial images as input for CLIP, with the corresponding text descriptions from the laion-face dataset as textual conditions. The training goal was to reconstruct the original images. I set the training steps to 1 million, saved results every 10,000 steps, the SD model is 1.5,and trained on a machine with 8 V100 32GB GPUs. However, when testing with scale=1.0, I found that the generated model cannot effectively maintain identity. Additionally, there is a noticeable change in facial structure with each 10,000 steps, and the generated faces differ significantly from the input.I wonder if I miss some training details. My training code is based on the tutorial_train_plus with only the following modifications: #ip-adapter-plus
clip_image = self.clip_image_processor(images=raw_image, return_tensors="pt").pixel_values
ip-adapter-full
face_image_file = item["face_image_file"]
face_image = Image.open(os.path.join(self.image_root_path, face_image_file))
clip_image = self.clip_image_processor(images=face_image, return_tensors="pt").pixel_values
ip-adapter-plus
image_proj_model = Resampler( dim=unet.config.cross_attention_dim, depth=4, dim_head=64, heads=12, num_queries=args.num_tokens, embedding_dim=image_encoder.config.hidden_size, output_dim=unet.config.cross_attention_dim, ff_mult=4 )
ip-adapter-full
image_proj_model=MLPProjModel(cross_attention_dim=unet.config.cross_attention_dim, clip_embeddings_dim=image_encoder.config.hidden_size)
Hello, I see ip-adapter-full-face_sd15.bin has been recently released. Could you explain what is the difference between this and previously released version of IP-Adapter-Face? Also, is this just for SD 1.5 or can work with SDXL too? Thanks
- data: we remove some small faces and do some crop augmentions.
- data preprocessing: we segment the face and remove background.
- model: we use full tokes (256 patch tokens + 1 cls tokens) and use a simple MLP to get face features.
IP-Adapter should be universal, not limited to human faces, for example, it can be used for clothing #135 (comment)
I have a question that how you define the prompt when training the face model? Do you use a detailed prompt of the target image or just simple prompt like 'a person'? I found that the generation quality in inference degrades when we use some detailed prompts.
I use detailed prompt.
do you tested my face demo? if it performs better than my model?
I tested the results, and the generated face IDs by the model at every 10,000 steps are inconsistent. Therefore, I believe my training has failed. Additionally, when I set the text prompt to an empty string and input the cropped facial images, a portion of the model fails to generate images containing faces.
I attempted to replicate the training process of the ip-adapter-full-face, but I encountered failure. For training data, I utilized 750,000 pairs of image-text data selected from LAION-face. I performed facial cropping and alignment operations on these 750,000 facial images, resulting in 224x224-sized facial images. I used the cropped facial images as input for CLIP, with the corresponding text descriptions from the laion-face dataset as textual conditions. The training goal was to reconstruct the original images. I set the training steps to 1 million, saved results every 10,000 steps, the SD model is 1.5,and trained on a machine with 8 V100 32GB GPUs. However, when testing with scale=1.0, I found that the generated model cannot effectively maintain identity. Additionally, there is a noticeable change in facial structure with each 10,000 steps, and the generated faces differ significantly from the input.I wonder if I miss some training details. My training code is based on the tutorial_train_plus with only the following modifications: #ip-adapter-plus
clip_image = self.clip_image_processor(images=raw_image, return_tensors="pt").pixel_values
ip-adapter-full
face_image_file = item["face_image_file"]
face_image = Image.open(os.path.join(self.image_root_path, face_image_file))
clip_image = self.clip_image_processor(images=face_image, return_tensors="pt").pixel_values
ip-adapter-plus
image_proj_model = Resampler( dim=unet.config.cross_attention_dim, depth=4, dim_head=64, heads=12, num_queries=args.num_tokens, embedding_dim=image_encoder.config.hidden_size, output_dim=unet.config.cross_attention_dim, ff_mult=4 )
ip-adapter-full
image_proj_model=MLPProjModel(cross_attention_dim=unet.config.cross_attention_dim, clip_embeddings_dim=image_encoder.config.hidden_size)
Hello, I see ip-adapter-full-face_sd15.bin has been recently released. Could you explain what is the difference between this and previously released version of IP-Adapter-Face? Also, is this just for SD 1.5 or can work with SDXL too? Thanks
- data: we remove some small faces and do some crop augmentions.
- data preprocessing: we segment the face and remove background.
- model: we use full tokes (256 patch tokens + 1 cls tokens) and use a simple MLP to get face features.
IP-Adapter should be universal, not limited to human faces, for example, it can be used for clothing #135 (comment)
I have a question that how you define the prompt when training the face model? Do you use a detailed prompt of the target image or just simple prompt like 'a person'? I found that the generation quality in inference degrades when we use some detailed prompts.
I use detailed prompt.
do you tested my face demo? if it performs better than my model?
I tested the results, and the generated face IDs by the model at every 10,000 steps are inconsistent. Therefore, I believe my training has failed. Additionally, when I set the text prompt to an empty string and input the cropped facial images, a portion of the model fails to generate images containing faces.
I think you can make a comparison with my model https://github.com/tencent-ailab/IP-Adapter/blob/main/ip_adapter-full-face_demo.ipynb
I attempted to replicate the training process of the ip-adapter-full-face, but I encountered failure. For training data, I utilized 750,000 pairs of image-text data selected from LAION-face. I performed facial cropping and alignment operations on these 750,000 facial images, resulting in 224x224-sized facial images. I used the cropped facial images as input for CLIP, with the corresponding text descriptions from the laion-face dataset as textual conditions. The training goal was to reconstruct the original images. I set the training steps to 1 million, saved results every 10,000 steps, the SD model is 1.5,and trained on a machine with 8 V100 32GB GPUs. However, when testing with scale=1.0, I found that the generated model cannot effectively maintain identity. Additionally, there is a noticeable change in facial structure with each 10,000 steps, and the generated faces differ significantly from the input.I wonder if I miss some training details. My training code is based on the tutorial_train_plus with only the following modifications: #ip-adapter-plus
clip_image = self.clip_image_processor(images=raw_image, return_tensors="pt").pixel_values
ip-adapter-full
face_image_file = item["face_image_file"]
face_image = Image.open(os.path.join(self.image_root_path, face_image_file))
clip_image = self.clip_image_processor(images=face_image, return_tensors="pt").pixel_values
ip-adapter-plus
image_proj_model = Resampler( dim=unet.config.cross_attention_dim, depth=4, dim_head=64, heads=12, num_queries=args.num_tokens, embedding_dim=image_encoder.config.hidden_size, output_dim=unet.config.cross_attention_dim, ff_mult=4 )
ip-adapter-full
image_proj_model=MLPProjModel(cross_attention_dim=unet.config.cross_attention_dim, clip_embeddings_dim=image_encoder.config.hidden_size)
Hello, I see ip-adapter-full-face_sd15.bin has been recently released. Could you explain what is the difference between this and previously released version of IP-Adapter-Face? Also, is this just for SD 1.5 or can work with SDXL too? Thanks
- data: we remove some small faces and do some crop augmentions.
- data preprocessing: we segment the face and remove background.
- model: we use full tokes (256 patch tokens + 1 cls tokens) and use a simple MLP to get face features.
IP-Adapter should be universal, not limited to human faces, for example, it can be used for clothing #135 (comment)
I have a question that how you define the prompt when training the face model? Do you use a detailed prompt of the target image or just simple prompt like 'a person'? I found that the generation quality in inference degrades when we use some detailed prompts.
I use detailed prompt.
do you tested my face demo? if it performs better than my model?
I tested the results, and the generated face IDs by the model at every 10,000 steps are inconsistent. Therefore, I believe my training has failed. Additionally, when I set the text prompt to an empty string and input the cropped facial images, a portion of the model fails to generate images containing faces.
I think you can make a comparison with my model https://github.com/tencent-ailab/IP-Adapter/blob/main/ip_adapter-full-face_demo.ipynb
Sorry,I just discovered a bug in my code where I applied transform to the face images twice. By the way, I'd like to confirm if my data processing flow and training details are correct?
I attempted to replicate the training process of the ip-adapter-full-face, but I encountered failure. For training data, I utilized 750,000 pairs of image-text data selected from LAION-face. I performed facial cropping and alignment operations on these 750,000 facial images, resulting in 224x224-sized facial images. I used the cropped facial images as input for CLIP, with the corresponding text descriptions from the laion-face dataset as textual conditions. The training goal was to reconstruct the original images. I set the training steps to 1 million, saved results every 10,000 steps, the SD model is 1.5,and trained on a machine with 8 V100 32GB GPUs. However, when testing with scale=1.0, I found that the generated model cannot effectively maintain identity. Additionally, there is a noticeable change in facial structure with each 10,000 steps, and the generated faces differ significantly from the input.I wonder if I miss some training details. My training code is based on the tutorial_train_plus with only the following modifications: #ip-adapter-plus
clip_image = self.clip_image_processor(images=raw_image, return_tensors="pt").pixel_values
ip-adapter-full
face_image_file = item["face_image_file"]
face_image = Image.open(os.path.join(self.image_root_path, face_image_file))
clip_image = self.clip_image_processor(images=face_image, return_tensors="pt").pixel_values
ip-adapter-plus
image_proj_model = Resampler( dim=unet.config.cross_attention_dim, depth=4, dim_head=64, heads=12, num_queries=args.num_tokens, embedding_dim=image_encoder.config.hidden_size, output_dim=unet.config.cross_attention_dim, ff_mult=4 )
ip-adapter-full
image_proj_model=MLPProjModel(cross_attention_dim=unet.config.cross_attention_dim, clip_embeddings_dim=image_encoder.config.hidden_size)
Hello, I see ip-adapter-full-face_sd15.bin has been recently released. Could you explain what is the difference between this and previously released version of IP-Adapter-Face? Also, is this just for SD 1.5 or can work with SDXL too? Thanks
- data: we remove some small faces and do some crop augmentions.
- data preprocessing: we segment the face and remove background.
- model: we use full tokes (256 patch tokens + 1 cls tokens) and use a simple MLP to get face features.
IP-Adapter should be universal, not limited to human faces, for example, it can be used for clothing #135 (comment)
I have a question that how you define the prompt when training the face model? Do you use a detailed prompt of the target image or just simple prompt like 'a person'? I found that the generation quality in inference degrades when we use some detailed prompts.
I use detailed prompt.
do you tested my face demo? if it performs better than my model?
I tested the results, and the generated face IDs by the model at every 10,000 steps are inconsistent. Therefore, I believe my training has failed. Additionally, when I set the text prompt to an empty string and input the cropped facial images, a portion of the model fails to generate images containing faces.
I think you can make a comparison with my model https://github.com/tencent-ailab/IP-Adapter/blob/main/ip_adapter-full-face_demo.ipynb
Sorry,I just discovered a bug in my code where I applied transform to the face images twice. By the way, I'd like to confirm if my data processing flow and training details are correct?
Initially, I only used cropped facial images with an empty text prompt, attempting to reconstruct the cropped faces solely based on the image features extracted by CLIP. However, I found that as the training progressed, the reconstruction performance improved initially but then deteriorated. It was consistently challenging to fully preserve identity features, with only intermediate results exhibiting relatively high similarity. Have you conducted similar experiments before?
I attempted to replicate the training process of the ip-adapter-full-face, but I encountered failure. For training data, I utilized 750,000 pairs of image-text data selected from LAION-face. I performed facial cropping and alignment operations on these 750,000 facial images, resulting in 224x224-sized facial images. I used the cropped facial images as input for CLIP, with the corresponding text descriptions from the laion-face dataset as textual conditions. The training goal was to reconstruct the original images. I set the training steps to 1 million, saved results every 10,000 steps, the SD model is 1.5,and trained on a machine with 8 V100 32GB GPUs. However, when testing with scale=1.0, I found that the generated model cannot effectively maintain identity. Additionally, there is a noticeable change in facial structure with each 10,000 steps, and the generated faces differ significantly from the input.I wonder if I miss some training details. My training code is based on the tutorial_train_plus with only the following modifications: #ip-adapter-plus
clip_image = self.clip_image_processor(images=raw_image, return_tensors="pt").pixel_values
ip-adapter-full
face_image_file = item["face_image_file"]
face_image = Image.open(os.path.join(self.image_root_path, face_image_file))
clip_image = self.clip_image_processor(images=face_image, return_tensors="pt").pixel_values
ip-adapter-plus
image_proj_model = Resampler( dim=unet.config.cross_attention_dim, depth=4, dim_head=64, heads=12, num_queries=args.num_tokens, embedding_dim=image_encoder.config.hidden_size, output_dim=unet.config.cross_attention_dim, ff_mult=4 )
ip-adapter-full
image_proj_model=MLPProjModel(cross_attention_dim=unet.config.cross_attention_dim, clip_embeddings_dim=image_encoder.config.hidden_size)
Hello, I see ip-adapter-full-face_sd15.bin has been recently released. Could you explain what is the difference between this and previously released version of IP-Adapter-Face? Also, is this just for SD 1.5 or can work with SDXL too? Thanks
- data: we remove some small faces and do some crop augmentions.
- data preprocessing: we segment the face and remove background.
- model: we use full tokes (256 patch tokens + 1 cls tokens) and use a simple MLP to get face features.
IP-Adapter should be universal, not limited to human faces, for example, it can be used for clothing #135 (comment)
I have a question that how you define the prompt when training the face model? Do you use a detailed prompt of the target image or just simple prompt like 'a person'? I found that the generation quality in inference degrades when we use some detailed prompts.
I use detailed prompt.
do you tested my face demo? if it performs better than my model?
I tested the results, and the generated face IDs by the model at every 10,000 steps are inconsistent. Therefore, I believe my training has failed. Additionally, when I set the text prompt to an empty string and input the cropped facial images, a portion of the model fails to generate images containing faces.
I think you can make a comparison with my model https://github.com/tencent-ailab/IP-Adapter/blob/main/ip_adapter-full-face_demo.ipynb
Sorry,I just discovered a bug in my code where I applied transform to the face images twice. By the way, I'd like to confirm if my data processing flow and training details are correct?
Initially, I only used cropped facial images with an empty text prompt, attempting to reconstruct the cropped faces solely based on the image features extracted by CLIP. However, I found that as the training progressed, the reconstruction performance improved initially but then deteriorated. It was consistently challenging to fully preserve identity features, with only intermediate results exhibiting relatively high similarity. Have you conducted similar experiments before?
I think that only reconstructing the cropped faces is meaningless. To improve face consistency, you can also use ID embedding from face model, I found it is very helpful.
Hello,
I see ip-adapter-full-face_sd15.bin has been recently released. Could you explain what is the difference between this and previously released version of IP-Adapter-Face? Also, is this just for SD 1.5 or can work with SDXL too?
Thanks