Open gagbaghdas opened 9 months ago
you can try a higher strength
you can try a higher strength
Great @xiaohu2015 , thank you. 1 strength solve the problem.
Now the result is 95% close to the original image :D . btw, you can see the slight diffs , is there anything I can do for getting the 100% same result? ))
currently, it can not achive 100%, maybe you can train such a adapter for cloth
currently, it can not achive 100%, maybe you can train such a adapter for cloth
Thanks for the info. Do you know approx what kind of resources will I need to train it?
hello,
Also is there a training script for this? Can re-use existing ones?
currently, it can not achive 100%, maybe you can train such a adapter for cloth
Thanks for the info. Do you know approx what kind of resources will I need to train it?
It's hard to say without doing experiments.
currently, it can not achive 100%, maybe you can train such a adapter for cloth
Thanks for the info. Do you know approx what kind of resources will I need to train it?
It's hard to say without doing experiments.
Thanks. And regarding the dataset, what do you think about DeepFashion2 Dataset?
currently, it can not achive 100%, maybe you can train such a adapter for cloth
Thanks for the info. Do you know approx what kind of resources will I need to train it?
It's hard to say without doing experiments.
Thanks. And regarding the dataset, what do you think about DeepFashion2 Dataset?
I think it is OK
@xiaohu2015 Great, thank you. Then I'm going to try to train it on clothes, Will ping here in case of any questions or problems.
OK
@xiaohu2015 I've started the training. But at some point it seems it interrupted. So here is my last checkpoint.
And from the logs I can see the last step:
Epoch 8, step 4988
So I need to continue the training right? if so ,how can I make so it would continue from the checkpoint? Should I just set the pre-trained-model-path to the last checkpoint ? Or there is anything else I should do?
accelerator.print(f"Resuming from checkpoint {path}")
accelerator.load_state(os.path.join(args.output_dir, path)
@xiaohu2015 what about this approach ?
@xiaohu2015 what about this approach ?
yes, it works
@xiaohu2015 btw , the starting model path is --pretrained_model_name_or_path="stable-diffusion-v1-5/"
is it ok to fine tune the IP-Adapter on clothes? Or I should use the ip-adapter_sd15.bin
as a pretrained model ? I mean maybe I'm doing something wrong and trying to train it from zero?
@xiaohu2015 btw , the starting model path is
--pretrained_model_name_or_path="stable-diffusion-v1-5/"
is it ok to fine tune the IP-Adapter on clothes? Or I should use theip-adapter_sd15.bin
as a pretrained model ? I mean maybe I'm doing something wrong and trying to train it from zero?
hi, for clothes, I think you should use ip-adapter-full model (using all token features of CLIP or DINO)
@xiaohu2015 so you mean the --pretrained_model_name_or_path
should point to ip-adapter-full
instead of stable-diffusion-v1-5/
?
@xiaohu2015 so you mean the
--pretrained_model_name_or_path
should point toip-adapter-full
instead ofstable-diffusion-v1-5/
?
I mean this ip-adpater: https://github.com/tencent-ailab/IP-Adapter/blob/main/ip_adapter/ip_adapter.py#L316
@xiaohu2015 so you mean the
--pretrained_model_name_or_path
should point toip-adapter-full
instead ofstable-diffusion-v1-5/
?I mean this ip-adpater: https://github.com/tencent-ailab/IP-Adapter/blob/main/ip_adapter/ip_adapter.py#L316
Sorry for "stupid" questions, I'm new to this (( So you mean I should use the full model in my training code? Like instead of this?
class IPAdapter(torch.nn.Module):
"""IP-Adapter"""
def __init__(self, unet, image_proj_model, adapter_modules, ckpt_path=None):
super().__init__()
self.unet = unet
self.image_proj_model = image_proj_model
self.adapter_modules = adapter_modules
if ckpt_path is not None:
self.load_from_checkpoint(ckpt_path)
def forward(self, noisy_latents, timesteps, encoder_hidden_states, image_embeds):
ip_tokens = self.image_proj_model(image_embeds)
encoder_hidden_states = torch.cat([encoder_hidden_states, ip_tokens], dim=1)
# Predict the noise residual
noise_pred = self.unet(noisy_latents, timesteps, encoder_hidden_states).sample
return noise_pred
def load_from_checkpoint(self, ckpt_path: str):
# Calculate original checksums
orig_ip_proj_sum = torch.sum(torch.stack([torch.sum(p) for p in self.image_proj_model.parameters()]))
orig_adapter_sum = torch.sum(torch.stack([torch.sum(p) for p in self.adapter_modules.parameters()]))
state_dict = torch.load(ckpt_path, map_location="cpu")
# Load state dict for image_proj_model and adapter_modules
self.image_proj_model.load_state_dict(state_dict["image_proj"], strict=True)
self.adapter_modules.load_state_dict(state_dict["ip_adapter"], strict=True)
# Calculate new checksums
new_ip_proj_sum = torch.sum(torch.stack([torch.sum(p) for p in self.image_proj_model.parameters()]))
new_adapter_sum = torch.sum(torch.stack([torch.sum(p) for p in self.adapter_modules.parameters()]))
# Verify if the weights have changed
assert orig_ip_proj_sum != new_ip_proj_sum, "Weights of image_proj_model did not change!"
assert orig_adapter_sum != new_adapter_sum, "Weights of adapter_modules did not change!"
print(f"Successfully loaded weights from checkpoint {ckpt_path}")
@xiaohu2015 so you mean the
--pretrained_model_name_or_path
should point toip-adapter-full
instead ofstable-diffusion-v1-5/
?I mean this ip-adpater: https://github.com/tencent-ailab/IP-Adapter/blob/main/ip_adapter/ip_adapter.py#L316
Sorry for "stupid" questions, I'm new to this (( So you mean I should use the full model in my training code? Like instead of this?
class IPAdapter(torch.nn.Module): """IP-Adapter""" def __init__(self, unet, image_proj_model, adapter_modules, ckpt_path=None): super().__init__() self.unet = unet self.image_proj_model = image_proj_model self.adapter_modules = adapter_modules if ckpt_path is not None: self.load_from_checkpoint(ckpt_path) def forward(self, noisy_latents, timesteps, encoder_hidden_states, image_embeds): ip_tokens = self.image_proj_model(image_embeds) encoder_hidden_states = torch.cat([encoder_hidden_states, ip_tokens], dim=1) # Predict the noise residual noise_pred = self.unet(noisy_latents, timesteps, encoder_hidden_states).sample return noise_pred def load_from_checkpoint(self, ckpt_path: str): # Calculate original checksums orig_ip_proj_sum = torch.sum(torch.stack([torch.sum(p) for p in self.image_proj_model.parameters()])) orig_adapter_sum = torch.sum(torch.stack([torch.sum(p) for p in self.adapter_modules.parameters()])) state_dict = torch.load(ckpt_path, map_location="cpu") # Load state dict for image_proj_model and adapter_modules self.image_proj_model.load_state_dict(state_dict["image_proj"], strict=True) self.adapter_modules.load_state_dict(state_dict["ip_adapter"], strict=True) # Calculate new checksums new_ip_proj_sum = torch.sum(torch.stack([torch.sum(p) for p in self.image_proj_model.parameters()])) new_adapter_sum = torch.sum(torch.stack([torch.sum(p) for p in self.adapter_modules.parameters()])) # Verify if the weights have changed assert orig_ip_proj_sum != new_ip_proj_sum, "Weights of image_proj_model did not change!" assert orig_adapter_sum != new_adapter_sum, "Weights of adapter_modules did not change!" print(f"Successfully loaded weights from checkpoint {ckpt_path}")
no, here "full" means use all token features of clip
So you mean insteaed of this
#ip-adapter
image_proj_model = ImageProjModel(
cross_attention_dim=unet.config.cross_attention_dim,
clip_embeddings_dim=image_encoder.config.projection_dim,
clip_extra_context_tokens=4,
)
ip_adapter = IPAdapter(unet, image_proj_model, adapter_modules, args.pretrained_ip_adapter_path)
I should use this one?
class IPAdapterFull(IPAdapterPlus):
"""IP-Adapter with full features"""
def init_proj(self):
image_proj_model = MLPProjModel(
cross_attention_dim=self.pipe.unet.config.cross_attention_dim,
clip_embeddings_dim=self.image_encoder.config.hidden_size,
).to(self.device, dtype=torch.float32)
return image_proj_model
Sorry again for this kind of questions, I want to be sure 100% I'm not doing anything wrong :D
yes
yes
Great, thanks. I'll get back here with result ( hopefully ) or questions :D
@xiaohu2015 I'm a bit stuck here, using IPAdapterFull from the ip-adapter requires SD Pipeline , but my current training code doesn't use SD pipeline. Can I share my current training code with you? So you can give me feedback on what can be changed or improved?
@xiaohu2015 I'm a bit stuck here, using IPAdapterFull from the ip-adapter requires SD Pipeline , but my current training code doesn't use SD pipeline. Can I share my current training code with you? So you can give me feedback on what can be changed or improved?
hi, I mean you can use this projection net (https://github.com/tencent-ailab/IP-Adapter/blob/main/ip_adapter/ip_adapter.py#L316) to extract features from CLIP model, and the features are then used as the keys and values of cross-attention layers of IP-Adapter.
@gagbaghdas ,
Hi, I want to train with the VITON-HD dataset. I trained the model, but it gives an error when testing it. Have you tested it?
Thank you guys, im in, gonna try it too
@gagbaghdas Hi! I was trying to train my own IP adapter for clothes. How was your result ? Was it successful ? I would appreciate if you can share your experiment :) :)
Do you use random noise in your reasoning, and if I want to replace random noise with a noisy photo what should I do
@gagbaghdas Hi! I was trying to train my own IP adapter for clothes. How was your result ? Was it successful ? I would appreciate if you can share your experiment :) :)
Hey, unfortunately no, there were some issues especially with different body sizes , and I switched to another thing
@xiaohu2015 btw , the starting model path is
--pretrained_model_name_or_path="stable-diffusion-v1-5/"
is it ok to fine tune the IP-Adapter on clothes? Or I should use theip-adapter_sd15.bin
as a pretrained model ? I mean maybe I'm doing something wrong and trying to train it from zero?hi, for clothes, I think you should use ip-adapter-full model (using all token features of CLIP or DINO)
Hello, I have a question for you, why IPAdapterFull can be used to extract more detailed information of clothes. IPAdapterFull is simpler than IPAdapterPlus (Resampler) model, which only has MLPProjModel. Thanks!
Hey guys. Anyone have an idea what I'm doing wrong ? Something is wrong with colors here (( Can't find the problem . Here are the initial, prompt, mask and the result images. As you can see the RED hoody become gray on the result :D
Here is the part from my code. Method
inPaintingUsingIPAdapter
.Any help would be appreciated.
Thanks in advance.