harlanhong / CVPR2022-DaGAN

Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation
https://harlanhong.github.io/publications/dagan.html
Other
957 stars 125 forks source link

Missing steps to use command line demo #33

Closed yahskapar closed 2 years ago

yahskapar commented 2 years ago

I likely am missing some key information common to running demos for projects like this, but I was hoping the author or anyone else that is knowledgeable can help me out here. I'm attempting to run the demo as per the repo instructions using a source image of my own and a driving video of my own. I'm trying to utilize the SPADE checkpoint provided as a download, as well as other checkpoints (e.g., related to depth and encoder) that seemed to be required in order to run the demo code. This is all being attempted in a conda environment with the dependencies fulfilled on a Macbook Pro (so, Mac OSX without a dedicated GPU). From what I understand, the demo should be able to be run on such a simple machine without a GPU and/or Linux.

I seem to be having issues with loading checkpoints themselves, as evidenced by ultimately encountering an error such as:

RuntimeError: Error(s) in loading state_dict for ResnetEncoder:
    size mismatch for encoder.layer1.0.conv1.weight: copying a param with shape torch.Size([64, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
    size mismatch for encoder.layer1.1.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
    size mismatch for encoder.layer2.0.conv1.weight: copying a param with shape torch.Size([128, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 64, 3, 3]).
    size mismatch for encoder.layer2.0.downsample.0.weight: copying a param with shape torch.Size([512, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 64, 1, 1]).
    size mismatch for encoder.layer2.0.downsample.1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for encoder.layer2.0.downsample.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for encoder.layer2.0.downsample.1.running_mean: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for encoder.layer2.0.downsample.1.running_var: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for encoder.layer2.1.conv1.weight: copying a param with shape torch.Size([128, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]).
    size mismatch for encoder.layer3.0.conv1.weight: copying a param with shape torch.Size([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 128, 3, 3]).
    size mismatch for encoder.layer3.0.downsample.0.weight: copying a param with shape torch.Size([1024, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 128, 1, 1]).
    size mismatch for encoder.layer3.0.downsample.1.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
    size mismatch for encoder.layer3.0.downsample.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
    size mismatch for encoder.layer3.0.downsample.1.running_mean: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
    size mismatch for encoder.layer3.0.downsample.1.running_var: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
    size mismatch for encoder.layer3.1.conv1.weight: copying a param with shape torch.Size([256, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
    size mismatch for encoder.layer4.0.conv1.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 256, 3, 3]).
    size mismatch for encoder.layer4.0.downsample.0.weight: copying a param with shape torch.Size([2048, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 256, 1, 1]).
    size mismatch for encoder.layer4.0.downsample.1.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
    size mismatch for encoder.layer4.0.downsample.1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
    size mismatch for encoder.layer4.0.downsample.1.running_mean: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
    size mismatch for encoder.layer4.0.downsample.1.running_var: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
    size mismatch for encoder.layer4.1.conv1.weight: copying a param with shape torch.Size([512, 2048, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
    size mismatch for encoder.fc.weight: copying a param with shape torch.Size([1000, 2048]) from checkpoint, the shape in current model is torch.Size([1000, 512]).

Are there specific steps I should be taking, that are not listed in the repo, in order to run the demo code using a CPU? Is it possible to run the demo code using a CPU? Any help would be appreciated. The command I'm trying to use to run the demo is:

python demo.py --config config/vox-adv-256.yaml --driving_video driving.mp4 --source_image source.png --checkpoint download/SPADE_DaGAN_vox_adv_256.pth.tar --relative --adapt_scale --kp_num 15 --generator SPADEDepthAwareGenerator --find_best_frame

yahskapar commented 2 years ago

I managed to figure out my mistakes after digging into the code a bit and realizing I had the wrong depth.pth and encoder.pth files. For those who happen to encounter similar errors, some recommendations:

1) Check to make sure you are using the --cpu option with your command if you happen to not have a CUDA-enabled GPU on your machine. In my case, I simply wanted to get the demo up and running on a laptop that had no dedicated GPU. This is probably an uncommon thing to attempt to do given the nature of the author's work.

2) Check to make sure you are using the correct depth.pth and encoder.pth files. Furthermore, if you use the SPADE checkpoint, make sure you use the correct generator (SPADEDepthAwareGenerator). Once I started using the .pth files in the depth_face_model folder (found in the larger folder with downloads for various checkpoints in this repo's README), the remaining size mismatch errors and a few other obscure errors disappeared.