johannakarras / DreamPose

Official implementation of "DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion"
MIT License
962 stars 73 forks source link

Fine-tuned on multiple input images #36

Open rohitpaul23 opened 1 year ago

rohitpaul23 commented 1 year ago

What is the proper way of fine-tuning the model on multiple images?

I passed three different images(.png files) of the same person and its corresponding poses(_densepose.npy)

It seems the model gets overfitted by the poses of the input images. So at the time of inference, the resultant frame is affected by the training image pose, even though the poses I passed at the time of inference don't contain any such poses.

When fined tuned the model with num_trained_epochs = 500 using 3 images, I am getting checkpoints from different epochs 166, 333, 499. Even though the results are better for epoch 166, for epochs 333 and 499 there are multiple reflections of the person images in a frame which somewhat correspond to the images used when finetuning

*The frame I am sharing shows the dual reflection of the person and the pose showing is not present in the posefolder and is one of the poses supplied at the time of finetuning. ![pred#63](https://user-images.githubusercontent.com/34906547/236816264-ab0952cb-db91-4643-9e53-8a5f66dfc648.png)

Thank You

nikkozzblu commented 1 year ago

I had a similar issue and came to the conclusion there was a typo in the code: https://github.com/johannakarras/DreamPose/blob/5bf30b7df70cf6f2e0bb25556c6ff2cbf0f2b1bf/finetune-unet.py#L191 should be frame_j = [example["frame_j"] for example in examples]

Working much better after that on multi-frame fine-tuning