zyhbili / Pose2Img

A warping based image translation model focusing on upper body synthesis.
35 stars 5 forks source link

Replace new photos in the demo #3

Closed liuweie closed 1 year ago

liuweie commented 1 year ago

Thank you for your work. Can I generate a video corresponding to any photo according to the pre training model you provided? For example, use Oliver's model and a photo of Luo Xiang to generate Luo Xiang's video.

I have tried to repalce Oliver's img to Luoxiang's img in inference.py(def process_source), def process_source(self):

    kp_path = self.cfg.INFER.src_kp_path
    kp = np.load(kp_path)
    kp = pose137_to_pose122(kp).transpose(1,0)
    path = kp_path.split("/")[-1]
    filename, _ = os.path.splitext(path)  
    img_path = os.path.join(self.img_base,filename+self.img_extension)  #img_path is a Luoxiang's img now.

` but it still generate Oliver's video. I feel very confused .

zyhbili commented 1 year ago

Sorry, our provided model can only generate Oliver. It failed to generalize to one-shot img. If you want to generate Luoxiang, you need to prepare the dataset and train model following instruction.

ShenhanQian commented 1 year ago

This module consists of two parts: an image warping process and an image translation network.

When you change the input image from Oliver to Luo, the image warping results should change. But since the image translation network was only trained on Oliver's appearance, it will always turn the warping result into Oliver.

Therefore, if you want to work on another subject, you need to prepare data and re-train the image translation network, just as @zyhbili has suggested.

liuweie commented 1 year ago

Thanks ~ I got it.

liuweie commented 1 year ago

Hello @zyhbili @ShenhanQian , here is another question when I want to custom dataset using OpenPose. You have mentioned that "The raw keypoints for each frame is of shape (3, 137)" in README.md, but I found that the shape of Oliver's keypoints you provided is (2, 122), which means the model's input is image and corresponding (2,122) npy files. image So,what additional processing do I need to do to generate a (2,122) shape npy file ? I have generated (3,137) dimension npy file already.

zyhbili commented 1 year ago

We feed the keypoints of shape(2, 122) into our Pose2Img. The output of OpenPose is of shape (3,137). For simplicity, we process it in the dataloader. For training, we check the data dim in the training dataloader, thus it supports either 122 or 137 dim input. During inference, our Voice2Pose produces keypoints of shape(2,121) (w/o root node) directly. And there is no need for us to map it back to (2,137).
And here is the func to convert keypoints from (2,137) to (2,122).