johndpope / SPEAK-hack

Using Claude Sonnet to reverse engineer paper Listen, Disentangle, and Control: Controllable Speech-Driven Talking Head Generation
https://arxiv.org/pdf/2405.07257
7 stars 0 forks source link

Feat/stylegan #4

Closed johndpope closed 3 months ago

johndpope commented 3 months ago

debug_step_0_resolution_32400

it's not training with higher res than 64 - blows up with OOM -but it did disentangle the head pose - which i'm amazed by. this is just half a day on 3090.

johndpope commented 3 months ago

i've updated code to use checkpoint from torch.utils.checkpoint import checkpoint - maybe it helps to climb to higher res. it will cycle through resolutions = [64, 128, 256] gonna run this overnight.

johndpope commented 3 months ago

I introduce GAN and seems to be making progress - i forced it to focus on head pose which it did - https://github.com/johndpope/SPEAK-hack/pull/4 - I will need to confirm it can train at higher resolution now before i throw at some cloud compute.