tencent-ailab / IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Apache License 2.0
4.48k stars 293 forks source link

train result faceid_plus, #298

Open jeeveenn opened 4 months ago

jeeveenn commented 4 months ago

Hello, thank you for your excellent work. When I was training faceid_plus, as the training step increased, the generated images became worse. I used about 1,500,000 images for training. What may be the reason? the train code:

accelerate launch --num_processes 4 --multi_gpu --mixed_precision "fp16" \
  tutorial_train_faceid_plus.py \
  --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
  --image_encoder_path="laion/CLIP-ViT-H-14-laion2B-s32B-b79K" \
  --pretrained_ip_adapter_path="model/ip-adapter-faceid-plusv2_sd15.bin"\
  --data_json_file="*.json" \
  --data_root_path="" \
  --mixed_precision="fp16" \
  --resolution=512 \
  --train_batch_size=8 \
  --dataloader_num_workers=4 \
  --learning_rate=1e-04 \
  --weight_decay=0.01 \
  --output_dir="checkpoint" \
  --save_steps=1000

the generated images,first picture: 10000 steps; 10000 second picture: 270000 steps 270000

xiaohu2015 commented 4 months ago

it is very strange, have you also tested the training script of https://github.com/tencent-ailab/IP-Adapter/blob/main/tutorial_train_faceid.py

jeeveenn commented 4 months ago

I'll give it a try

goshiaoki commented 4 months ago

Hello. Could you explain the json file?

--data_json_file="*.json" \

I want to train the model but I'm not sure about how to make the json file.

xiaohu2015 commented 4 months ago

list of dict: [{"image_file": "1.png", "id_embed_file": "faceid.bin"}]

I extract id embedding offline and save to faceid.bin

goshiaoki commented 4 months ago

Thank you. Let me try that.

t00350320 commented 3 months ago

list of dict: [{"image_file": "1.png", "id_embed_file": "faceid.bin"}]

I extract id embedding offline and save to faceid.bin

will it be ok like?

    # Load face encoder
    app = FaceAnalysis(name='antelopev2', root='./', providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
    app.prepare(ctx_id=0, det_size=(640, 640))

    face_image = load_image("./girl2.jpg")
    face_image = resize_img(face_image)

    face_info = app.get(cv2.cvtColor(np.array(face_image), cv2.COLOR_RGB2BGR))
    face_info = sorted(face_info, key=lambda x:(x['bbox'][2]-x['bbox'][0])*(x['bbox'][3]-x['bbox'][1]))[-1] # only use the maximum face
    face_emb = face_info['embedding']

1、how to transfer the face_emb to faceid.bin correctly? after directly saved to file , errors like this

_pickle.UnpicklingError: Caught UnpicklingError in DataLoader worker process 0.
_pickle.UnpicklingError: invalid load key, '$'.

it may be the file format error?

2、data.json file

  {
    "image_file": "faceimage.jpg",
    "text": "a hansome man",
    "id_embed_file": "0321faceid.bin"
  },

faceimage.jpg must be same with face_info size? or other size? i mean the originial face box of a image not always square like 256256, 512512 , if we resize it forcibly, the ground truth of a face will be destroyed.

xiaohu2015 commented 3 months ago

1) I use torch.save(face_info['embedding'], "faceid.bin") 2) a norm way is resize the short size to 512, then center crop. (you can also center crop with the help of face bounding box)

t00350320 commented 3 months ago
  1. I use torch.save(face_info['embedding'], "faceid.bin")
  2. a norm way is resize the short size to 512, then center crop. (you can also center crop with the help of face bounding box)

hi @xiaohu2015 , i got some new issuses train with "tutorial_train_faceid.py" then transfer "pytorch_model.bin" to adapter.bin with parameters "ip_adapter,xxx" infer with "ip_adapter-full-face_demo.ipynb" in Line "ip_model = IPAdapterFull(pipe, image_encoder_path, ip_ckpt, device, num_tokens=257)"

--> 139 self.image_proj_model.load_state_dict(state_dict["image_proj"])
    140 ip_layers = torch.nn.ModuleList(self.pipe.unet.attn_processors.values())
    141 ip_layers.load_state_dict(state_dict["ip_adapter"])

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:2041, in Module.load_state_dict(self, state_dict, strict)
   2036         error_msgs.insert(
   2037             0, 'Missing key(s) in state_dict: {}. '.format(
   2038                 ', '.join('"{}"'.format(k) for k in missing_keys)))
   2040 if len(error_msgs) > 0:
-> 2041     raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
   2042                        self.__class__.__name__, "\n\t".join(error_msgs)))
   2043 return _IncompatibleKeys(missing_keys, unexpected_keys)

RuntimeError: Error(s) in loading state_dict for MLPProjModel:
    Missing key(s) in state_dict: "proj.3.weight", "proj.3.bias". 
    Unexpected key(s) in state_dict: "norm.weight", "norm.bias". 
    size mismatch for proj.0.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
    size mismatch for proj.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]).
    size mismatch for proj.2.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([768, 1280]).
    size mismatch for proj.2.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([768]).

guess some dim or shape size mismatch between training and infer stage? What is the differ with tutorial_train_faceid.py and tutorial_train_plus.py, and there is no special "ip_adapter-face.ipynb" file corresponded with tutorial_train_faceid.py