Open johndpope opened 5 months ago
@fenghe12 / @JaLnYn / @ChenyangWang95
this might actually work.
In megaportaits I use custom resnet50 probably safer to switch that in because otherwise the model is going to just discard the updates???. I check in the morning.
@johndpope is it just me or sonnet 3.5 machine learning code output is actually way more readable than opus ? feels like actual working code this time !!!
something maybe not quite right. i train overnight. this is still epoch 0 -
checkpoint-86500
i change the code back to use 512x512 resume training - and get this.
im seeing newer clearer images advancing in epoch 1 - even after a few more cycles - will udpate here later. i think by epoch 4 - probably going to be fairly decent.
i add some tensorboard stuff - and surface the losses.
recon_step_126000.png
UPDATE - my bad was overfitting to one image. I just push updated dataloader. new debug image.
Starting training again. was seeing OOM errors - check your num_of_workers.
UPDATE - i restart training - I change the generator to use resblocks - maybe will help recreate the image better.
UPDATE - Sunday so i rebuilt code to do progressive training with resolution upscaling - 64,128....256 ...512 added tensorboard losses
i give up training across celebA - i overfit to one pair of images....
training progress so far
UPDATE - Sunday night
so had some battle with gradient explosions
ending up having to add some accummulation steps in that helped stablize things https://github.com/johndpope/SPEAK-hack/pull/3
looks like the learning rate is getting things into a minima....
UPDATE - i switch to use 256 because resnet50 cant return rich features 2048,7,7 for images less than 224x224.
i had to rework the generator to use less layers / and use 64 x 64 image resizing.