Current results of training - epoch 4

johndpope commented 3 weeks ago

i used another of the videos as driving - and it's (almost) obviously not rotating the head past the point where the original movie went - see below.

Screenshot from 2024-06-04 22-45-21

cross_reenacted_image_57

pred_frame_191

tomorrow i plug in bigger dataset.

when I normalize the images - i end up with this - looks bad - I add some code in train.py to un-normalize - happy with current results....

fyi - this is the frames dump out from mp4 - head cropped / maybe some warping. Screenshot from 2024-06-04 23-11-08

johndpope commented 2 weeks ago

try new main code - Jay @hazard-10 spotted an error with cosface in training - and claude fixed it. thats on top of these fixes.

Save / restore checkpoint) specify in config ./configs/training/stage10base.yaml to restore checkpoint
auto crop video frames to sweet spot
tensorboard losses
LPIPS added to perceptual - it's currently 10x - these wasn't specifeid in paper.
class PerceptualLoss(nn.Module): def init(self, device, weights={'vgg19': 20.0, 'vggface': 5.0, 'gaze': 4.0, 'lpips': 10.0}):
gaze (not yet done)
additional imagepyramide from one shot view code for loss (hopefully to sharpen image) - seems to be working
https://github.com/johndpope/MegaPortrait-hack/blob/main/model.py#L1070

the discriminator i've drafted code to take it to multiscale patch gan. maybe also boost image quality... https://github.com/johndpope/MegaPortrait-hack/issues/46

the leakage - im seeing with my overfitted videos. i think the es is source of problems. when I worked on Emote paper - https://github.com/johndpope/Emote-hack/blob/main/train_stage_1_referencenet.py

UPDATE - from re-reading above - i understand adding more losses - maybe counterproductive. that said - https://arxiv.org/pdf/2404.10667 - i put DPE losses from VASA paper into training code. it doesn't seem to be hurting. https://github.com/johndpope/MegaPortrait-hack/pull/51

johndpope commented 2 days ago

Dear CommitCrew -

I bring you a cleaner / faster / smarter way to disentangle images using 3x resnet50 backbones. https://arxiv.org/pdf/2405.07257

https://github.com/johndpope/speak-hack i just start training 5 minutes ago - so far.... not converging.

JZArray commented 1 day ago

@Kwentar @flyingshan how are your progresses now?

johndpope commented 1 day ago

had incorrectly configured to overfit - updated now https://github.com/johndpope/SPEAK-hack/issues/1

johndpope / MegaPortrait-hack