Additional training for Motion Imitation

ryo12882 commented 4 years ago

Hi, thank you for your awesome work.

Btw, I tried to transfer my own with other target images. Basically, it works but my head doesn't look like me. My hair style and face don't reflect. Then, I assume this happened because of pre-trained model. I saw datasets and found out most of people are short and black hair. What do you think? And if so, how do I train more datasets?

And also, I tried to do for fashion model who you provided. It works well!! I'm wondering what's going on.

Thanks in advance.

StevenLiuWen commented 4 years ago

@ryo12882 Hi, your guess is right. Our dataset (iPER focused on video) is not very big (though we have tried to make it larger), and it contains around 20 different people wearing 82 clothes with different textures in the training set. All (most) people are short black hairs. So, if the model trained on our iPER dataset, the results might be prone to be the training patterns (like the face, hair, and style of clothes)

The Fashion dataset has a more variety on the style of clothes and hairs than our iPER dataset. However, it only provides paired images with different views of the same person (not videos). So, the best model is trained when combining these two datasets together.

There are two ways to improve the results:

Improve the generalization of the model. First, it needs you to collects as more abundant as possible of videos with people dancing, from the Internet or other ways, and do some data cleaning. Then, training your new large collected dataset (The codes about how to train a model from your own dataset are under cleaning. After we have clean the training codes, we will release it as soon as possible).
Push the model focuses on a specifical person. If you just want the model work on yourself or other specifical people. You can prepare a short video like our iPER dataset. The person does A pose and additional random motions. Then, you can finetune the networks with your prepared video from our pretrained model. It could result in more high-fidelity results, and the shorting comings are obvious – that everyone needs to finetune their own model. However, this requirement is reasonable, and the codes are under testing and cleaning. After we have done, we will release all mentioned codes.

ryo12882 commented 4 years ago

Thank you for your quick response.

I understand all you explained. I would like to try both of ways but especially fine-tuning!

I also would like to ask you about two ways.

When is your team going to release these points?
What do you think how many datasets this requires to get reasonable results?
What do you think how long duration of video this requires to fine-tune?

And finally, I would like to join this project and help these tasks.

Thank you.

OldChi commented 4 years ago

@ryo12882 Hi, your guess is right. Our dataset (iPER focused on video) is not very big (though we have tried to make it larger), and it contains around 20 different people wearing 82 clothes with different textures in the training set. All (most) people are short black hairs. So, if the model trained on our iPER dataset, the results might be prone to be the training patterns (like the face, hair, and style of clothes)

The Fashion dataset has a more variety on the style of clothes and hairs than our iPER dataset. However, it only provides paired images with different views of the same person (not videos). So, the best model is trained when combining these two datasets together.

There are two ways to improve the results:

Improve the generalization of the model. First, it needs you to collects as more abundant as possible of videos with people dancing, from the Internet or other ways, and do some data cleaning. Then, training your new large collected dataset (The codes about how to train a model from your own dataset are under cleaning. After we have clean the training codes, we will release it as soon as possible).

Push the model focuses on a specifical person. If you just want the model work on yourself or other specifical people. You can prepare a short video like our iPER dataset. The person does A pose and additional random motions. Then, you can finetune the networks with your prepared video from our pretrained model. It could result in more high-fidelity results, and the shorting comings are obvious – that everyone needs to finetune their own model. However, this requirement is reasonable, and the codes are under testing and cleaning. After we have done, we will release all mentioned codes.

could u please talk about how to finetune the networks when the result is of low quality?

leesky1c commented 3 years ago

@ryo12882 Hi, your guess is right. Our dataset (iPER focused on video) is not very big (though we have tried to make it larger), and it contains around 20 different people wearing 82 clothes with different textures in the training set. All (most) people are short black hairs. So, if the model trained on our iPER dataset, the results might be prone to be the training patterns (like the face, hair, and style of clothes)

The Fashion dataset has a more variety on the style of clothes and hairs than our iPER dataset. However, it only provides paired images with different views of the same person (not videos). So, the best model is trained when combining these two datasets together.

There are two ways to improve the results:

Improve the generalization of the model. First, it needs you to collects as more abundant as possible of videos with people dancing, from the Internet or other ways, and do some data cleaning. Then, training your new large collected dataset (The codes about how to train a model from your own dataset are under cleaning. After we have clean the training codes, we will release it as soon as possible).

Push the model focuses on a specifical person. If you just want the model work on yourself or other specifical people. You can prepare a short video like our iPER dataset. The person does A pose and additional random motions. Then, you can finetune the networks with your prepared video from our pretrained model. It could result in more high-fidelity results, and the shorting comings are obvious – that everyone needs to finetune their own model. However, this requirement is reasonable, and the codes are under testing and cleaning. After we have done, we will release all mentioned codes.

Hi, would you mind telling us how to train the model with the Fashion dataset?

svip-lab / impersonator

Additional training for Motion Imitation #21