About shape of latent expression descriptors and global descriptor

TuesdayT commented 2 months ago

Hi, thanks for your awesome work.

I am replicating the structure of this research paper, but I am uncertain about the shape of latent expression descriptors and global descriptor. Can you provide them?

Thanks.

xuzheyuan624 commented 2 months ago

I have the same question. In my view, I think the latent expression and global descriptor are both a 1D vector, but I find the shape of appearance feature vs and the shape of warp ws are not matched. In Fig 9, the appearance encoder downsample 3 times, if the input image is 512x512, you can get a 4-D tensor with shape of 96x16x64x64, but the warping generator only unsample 4 times in height and width, it seems that if the latent expression is a 1-D tensor, the shape of wrap ws must be 3x16x16x16. The height and width are not matched. Im very confused about it, do u have any suggestion?

johndpope commented 2 months ago

does this code by @Kevinfringe help? https://github.com/Kevinfringe/MegaPortrait

https://github.com/Kevinfringe/MegaPortrait/blob/85ca3692a0abc3e906e91ed924dc311b4cad538b/model.py#L148

side note - I'm attempting to recreate VASA-1 paper here - https://github.com/johndpope/vasa-1-hack

johndpope commented 1 month ago

@xuzheyuan624 / @TuesdayT - I rebuild the MegaPortrait repo from Kevin using Claude Opus

https://github.com/johndpope/MegaPortrait-hack/ From my implementation - it's saying do a 50 dimension expression net vector that aligns to the resnet18

https://github.com/johndpope/MegaPortrait-hack/issues/11

update I have some progress. But without dimensions - haven’t been able to get it to work.

flyingshan commented 1 month ago

I have the same question. In my view, I think the latent expression and global descriptor are both a 1D vector, but I find the shape of appearance feature vs and the shape of warp ws are not matched. In Fig 9, the appearance encoder downsample 3 times, if the input image is 512x512, you can get a 4-D tensor with shape of 96x16x64x64, but the warping generator only unsample 4 times in height and width, it seems that if the latent expression is a 1-D tensor, the shape of wrap ws must be 3x16x16x16. The height and width are not matched. Im very confused about it, do u have any suggestion?

Found the same question here. I prefer zs/zd are 1d vectors, maybe the authors repeat them in shape so they can have shape like (B, 512, 4, 4) before sending to the W* generator.

johndpope commented 1 month ago

UPDATE thanks @flyingshan - i think it has to be 4x4 - 🤷 idk. you are going to get 16 pixels for that vs 1.

https://github.com/johndpope/MegaPortrait-hack/blob/main/model.py#L15 when my code runs through the warpgenerator - it upscales to end up at 64x64 so that is neat.

but then I can't add them together.

w_s2c = w_rt_s2c + w_em_s2c

w_em_s2c: torch.Size([1, 3, 16, 64, 64]) # 16 here for wem....🤷 idk if it should be 64? w_rt_s2c: torch.Size([1, 3, 64, 64, 64])

this is output of my warpgen / warpfield code.


WarpField > zs sum.shape: torch.Size([1, 512, 4, 4])
conv1x1 > x.shape: torch.Size([1, 2048, 4, 4])
reshape_layer > x.shape: torch.Size([1, 512, 4, 4, 4])
🍒  ResBlock3D x.shape: torch.Size([1, 512, 4, 4, 4])
   conv1 > out.shape: torch.Size([1, 256, 4, 4, 4])
   norm1 > out.shape: torch.Size([1, 256, 4, 4, 4])
   F.relu(out) > out.shape: torch.Size([1, 256, 4, 4, 4])
   conv2 > out.shape: torch.Size([1, 256, 4, 4, 4])
   norm2 > out.shape: torch.Size([1, 256, 4, 4, 4])
   residual > residual.shape: torch.Size([1, 256, 4, 4, 4])
upsample1 > x.shape: torch.Size([1, 256, 8, 8, 8])
🍒  ResBlock3D x.shape: torch.Size([1, 256, 8, 8, 8])
   conv1 > out.shape: torch.Size([1, 128, 8, 8, 8])
   norm1 > out.shape: torch.Size([1, 128, 8, 8, 8])
   F.relu(out) > out.shape: torch.Size([1, 128, 8, 8, 8])
   conv2 > out.shape: torch.Size([1, 128, 8, 8, 8])
   norm2 > out.shape: torch.Size([1, 128, 8, 8, 8])
   residual > residual.shape: torch.Size([1, 128, 8, 8, 8])
upsample2 > x.shape: torch.Size([1, 128, 16, 16, 16])
🍒  ResBlock3D x.shape: torch.Size([1, 128, 16, 16, 16])
   conv1 > out.shape: torch.Size([1, 64, 16, 16, 16])
   norm1 > out.shape: torch.Size([1, 64, 16, 16, 16])
   F.relu(out) > out.shape: torch.Size([1, 64, 16, 16, 16])
   conv2 > out.shape: torch.Size([1, 64, 16, 16, 16])
   norm2 > out.shape: torch.Size([1, 64, 16, 16, 16])
   residual > residual.shape: torch.Size([1, 64, 16, 16, 16])
upsample3 > x.shape: torch.Size([1, 64, 16, 32, 32])
🍒  ResBlock3D x.shape: torch.Size([1, 64, 16, 32, 32])
   conv1 > out.shape: torch.Size([1, 32, 16, 32, 32])
   norm1 > out.shape: torch.Size([1, 32, 16, 32, 32])
   F.relu(out) > out.shape: torch.Size([1, 32, 16, 32, 32])
   conv2 > out.shape: torch.Size([1, 32, 16, 32, 32])
   norm2 > out.shape: torch.Size([1, 32, 16, 32, 32])
   residual > residual.shape: torch.Size([1, 32, 16, 32, 32])
upsample4 > x.shape: torch.Size([1, 32, 16, 64, 64])
conv3x3x3 > x.shape: torch.Size([1, 3, 16, 64, 64])
gn > x.shape: torch.Size([1, 3, 16, 64, 64])
F.relu > x.shape: torch.Size([1, 3, 16, 64, 64])
tanh > x.shape: torch.Size([1, 3, 16, 64, 64])
w_em_s2c: torch.Size([1, 3, 16, 64, 64])
w_rt_s2c: torch.Size([1, 3, 64, 64, 64])

neeek2303 / MegaPortraits

About shape of latent expression descriptors and global descriptor #3