johndpope / MegaPortrait-hack

Using Claude Opus to reverse engineer code from MegaPortraits: One-shot Megapixel Neural Head Avatars
https://arxiv.org/abs/2207.07621
82 stars 8 forks source link

change some model configs #22

Closed JackAILab closed 6 months ago

JackAILab commented 6 months ago

We got 3 TODOs

TODO1 Define COMPRESS_DIM = 512 # 🤷 TODO 1: maybe 256 or 512, 512 may be more reasonable for Emtn/app compression

TODO 2 According to the description of the paper (Page11: predict the head pose and expression vector), zs should be a global descriptor, which is a vector. Otherwise, the existence of Emtn and Eapp is of little significance. The output feature is a matrix, which means it is basically not compressed. This encoder can be completely replaced by a VAE.

TODO 3 According to the description of the paper (Page11: To generate adaptive parameters, we multiply the foregoing sums and additionally learned matrices for each pair of parameters.), adaptive_matrix_gamma should be retained. It is not used to change the shape, but can generate learning parameters, which is more reasonable than just using sum.

JackAILab commented 6 months ago

Update changes of zs/d_sum

zd_sum = zd_sum.unsqueeze(-1).unsqueeze(-1) ### TODO 3 add

johndpope commented 6 months ago

Thanks Jiehui - I clean up the notes later.

There's some training augmentation functions that I have in this PR to align to paper - maybe it helps https://github.com/johndpope/MegaPortrait-hack/issues/14

The EmoPortraits paper should be dropping in next 30-60 days so we can check our assumptions. https://github.com/neeek2303/EMOPortraits

I do want to switch back to VASA https://github.com/johndpope/vasa-1-hack the keypoint detector in that codebase needs to be replaced with something like this warpgenerator.

johndpope commented 5 months ago

@JackAILab - I’m close to stabilising PR which has cyclic consistency loss. See Pr I take some extra steps to prepare data for cropping and warping - I’m in early stage of training I wonder if Normalising images - is it necessary? Should the images include the original image somewhere? It says they use cropped images only for loss - so I’m currently investigating saving these out - first cropped - second pass warped and cropped - I’m attempting. To cache these tensors for fasting loading

I have 33,000 videos at 512 - (see junk torrent file) but this preprocess steps is taking a minute to two per video.

Saving tensors to pytorch is ridiculous men requirements for 1 video - 1.5 gb pt file. Man I missing something here? I’m attempting to use npy file type now.

Fortunately we get the emoPortraits code and video set next month. What video set did you guys use validating things? How do you speed things up?

@kwentar

UPDATE - using the npz format - they come down dramatically in size.

still not sure about normalize - it makes the images all red - like homepage warp image.