Closed daniel-j-h closed 4 years ago
Here's where we are right now:
Here are example videos maximizing layer3 and layer4 activations:
Learnings
1e-2
We then looked into regularization terms to get rid of the high-frequency patterns; the total variation regularization loss implemented right now results in very strong checkerboard patterns.
Next actions: figure out better regularization term and look into prior art how other folks handle this.
Completely different approach now; below are results for maximizing layer2 activations with different learning rates and number of iterations (second one is stronger) - I think this is the way to go :rocket:
Dayum 🔥
Here are a few examples when we - instead of optimizing all activations - select a specific channel; the activations are of shape NxCxTxHxW; below are a few examples for optimizing with fixed i
in
acts[:,i, :, :, :].norm()
Starting with a random tensor instead of a seed video and optimizing all activations:
Starting from random tensor optimizing specific channels (3 and 6 in this case)
Bringing back the 3d total variation loss term and scaling it
seems to get rid of high frequencies in the output which makes it more pleasant to look at :hugs:
direct comparison:
Changing how we normalize the gradients
Latest version maximizing all channels in layer2
Latest version maximizing channel 6 in layer2
Here are results for
which barely fits on one of my GTX 1080 TIs:
Input clip from
youtube-dl -f 18 yJbXdOdTaJc
ffmpeg -i yJbXdOdTaJc.mp4 -ss 23:48 -to 23:53 -crf 23 -r 16 -an clips.mp4
Merging this into master as is right now. We can explore more advanced techniques such as dreaming at multiple scales and related in separate pull requests in the future.
Work in progress; let's see if this works out :hugs: