Do video models dream of electrical sheeps in motion

daniel-j-h commented 4 years ago

Work in progress; let's see if this works out :hugs:

daniel-j-h commented 4 years ago

Here's where we are right now:

Create a random video with 32 frames of shape CxTxHxW 3x32x112x112
Hook this random tensor up in the computational tree so we can optimize it
Pass this random tensor through the trained video model up to the nth layer
Create a loss maximizing the the nth layer's activations by optimizing the input video

Here are example videos maximizing layer3 and layer4 activations:

ex02-layer3-upscale

ex01-layer4-highlr-upscale

Learnings

Can use a very high learning rate e.g. 1e-2
Should maximize activations for a single channel / volume, not mean over all
Get high-frequency patterns fooling the network but not reasonable for humans

We then looked into regularization terms to get rid of the high-frequency patterns; the total variation regularization loss implemented right now results in very strong checkerboard patterns.

ex02-layer3-regscale-upscale

ex03-layer1-noreg-up

Next actions: figure out better regularization term and look into prior art how other folks handle this.

daniel-j-h commented 4 years ago

Completely different approach now; below are results for maximizing layer2 activations with different learning rates and number of iterations (second one is stronger) - I think this is the way to go :rocket:

dream-y

dream-final

More examples from: stem, layer1, layer2, layer3, layer4

![dream-0](https://user-images.githubusercontent.com/527241/66867484-ae045680-ef9b-11e9-9cfd-9aa58dffae3d.gif) ![dream-1](https://user-images.githubusercontent.com/527241/66867483-ae045680-ef9b-11e9-97a6-1d57a600601f.gif) ![dream-2](https://user-images.githubusercontent.com/527241/66867481-ad6bc000-ef9b-11e9-9b28-b24d3678cdc2.gif) ![dream-3](https://user-images.githubusercontent.com/527241/66867480-ad6bc000-ef9b-11e9-98d4-af9b922c3610.gif) ![dream-4](https://user-images.githubusercontent.com/527241/66867479-ad6bc000-ef9b-11e9-8e39-1f9cbf6460be.gif) --- ![dream-00](https://user-images.githubusercontent.com/527241/66867775-4995c700-ef9c-11e9-91e3-f4f119b3a7cd.gif) ![dream-01](https://user-images.githubusercontent.com/527241/66867774-4995c700-ef9c-11e9-80fa-bd87d22bde59.gif) ![dream-02](https://user-images.githubusercontent.com/527241/66867773-4995c700-ef9c-11e9-953a-d40ea55850c5.gif) ![dream-03](https://user-images.githubusercontent.com/527241/66867772-48fd3080-ef9c-11e9-9885-7a8b44c2813a.gif) ![dream-04](https://user-images.githubusercontent.com/527241/66867771-48fd3080-ef9c-11e9-9837-129892bdfe8a.gif)

sandhawalia commented 4 years ago

Dayum 🔥

daniel-j-h commented 4 years ago

Here are a few examples when we - instead of optimizing all activations - select a specific channel; the activations are of shape NxCxTxHxW; below are a few examples for optimizing with fixed i in

acts[:,i, :, :, :].norm()

dream-17-up dream-16-up dream-15-up dream-14-up dream-13-up dream-12-up dream-11-up dream-10-up

daniel-j-h commented 4 years ago

Starting with a random tensor instead of a seed video and optimizing all activations:

dream-rnd-up

daniel-j-h commented 4 years ago

Starting from random tensor optimizing specific channels (3 and 6 in this case)

dream-zz-up dream-z-up

daniel-j-h commented 4 years ago

Bringing back the 3d total variation loss term and scaling it

dream-tvn3-up

seems to get rid of high frequencies in the output which makes it more pleasant to look at :hugs:

direct comparison:

daniel-j-h commented 4 years ago

Changing how we normalize the gradients

primer1-up

daniel-j-h commented 4 years ago

Latest version maximizing all channels in layer2

primer1-up

Latest version maximizing channel 6 in layer2

primer1-up

daniel-j-h commented 4 years ago

Here are results for

layer2 again
maximizing specific TxHxW channel activations
resizing to frame size of 248 px on the shorter edge

which barely fits on one of my GTX 1080 TIs:

royal-wedding

Input clip from

youtube-dl -f 18 yJbXdOdTaJc

ffmpeg -i yJbXdOdTaJc.mp4 -ss 23:48 -to 23:53 -crf 23 -r 16 -an clips.mp4

daniel-j-h commented 4 years ago

Merging this into master as is right now. We can explore more advanced techniques such as dreaming at multiple scales and related in separate pull requests in the future.

moabitcoin / ig65m-pytorch

Do video models dream of electrical sheeps in motion #19