moabitcoin / ig65m-pytorch

PyTorch 3D video classification models pre-trained on 65 million Instagram videos
MIT License
265 stars 30 forks source link

Do video models dream of electrical sheeps in motion #19

Closed daniel-j-h closed 4 years ago

daniel-j-h commented 4 years ago

Work in progress; let's see if this works out :hugs:

daniel-j-h commented 4 years ago

Here's where we are right now:

  1. Create a random video with 32 frames of shape CxTxHxW 3x32x112x112
  2. Hook this random tensor up in the computational tree so we can optimize it
  3. Pass this random tensor through the trained video model up to the nth layer
  4. Create a loss maximizing the the nth layer's activations by optimizing the input video

Here are example videos maximizing layer3 and layer4 activations:

ex02-layer3-upscale

ex01-layer4-highlr-upscale

Learnings

We then looked into regularization terms to get rid of the high-frequency patterns; the total variation regularization loss implemented right now results in very strong checkerboard patterns.

ex02-layer3-regscale-upscale

ex03-layer1-noreg-up

Next actions: figure out better regularization term and look into prior art how other folks handle this.

daniel-j-h commented 4 years ago

Completely different approach now; below are results for maximizing layer2 activations with different learning rates and number of iterations (second one is stronger) - I think this is the way to go :rocket:

dream-y

dream-final

More examples from: stem, layer1, layer2, layer3, layer4 ![dream-0](https://user-images.githubusercontent.com/527241/66867484-ae045680-ef9b-11e9-9cfd-9aa58dffae3d.gif) ![dream-1](https://user-images.githubusercontent.com/527241/66867483-ae045680-ef9b-11e9-97a6-1d57a600601f.gif) ![dream-2](https://user-images.githubusercontent.com/527241/66867481-ad6bc000-ef9b-11e9-9b28-b24d3678cdc2.gif) ![dream-3](https://user-images.githubusercontent.com/527241/66867480-ad6bc000-ef9b-11e9-98d4-af9b922c3610.gif) ![dream-4](https://user-images.githubusercontent.com/527241/66867479-ad6bc000-ef9b-11e9-8e39-1f9cbf6460be.gif) --- ![dream-00](https://user-images.githubusercontent.com/527241/66867775-4995c700-ef9c-11e9-91e3-f4f119b3a7cd.gif) ![dream-01](https://user-images.githubusercontent.com/527241/66867774-4995c700-ef9c-11e9-80fa-bd87d22bde59.gif) ![dream-02](https://user-images.githubusercontent.com/527241/66867773-4995c700-ef9c-11e9-953a-d40ea55850c5.gif) ![dream-03](https://user-images.githubusercontent.com/527241/66867772-48fd3080-ef9c-11e9-9885-7a8b44c2813a.gif) ![dream-04](https://user-images.githubusercontent.com/527241/66867771-48fd3080-ef9c-11e9-9837-129892bdfe8a.gif)
sandhawalia commented 4 years ago

Dayum 🔥

daniel-j-h commented 4 years ago

Here are a few examples when we - instead of optimizing all activations - select a specific channel; the activations are of shape NxCxTxHxW; below are a few examples for optimizing with fixed i in

acts[:,i, :, :, :].norm()

dream-17-up dream-16-up dream-15-up dream-14-up dream-13-up dream-12-up dream-11-up dream-10-up

daniel-j-h commented 4 years ago

Starting with a random tensor instead of a seed video and optimizing all activations:

dream-rnd-up

daniel-j-h commented 4 years ago

Starting from random tensor optimizing specific channels (3 and 6 in this case)

dream-zz-up dream-z-up

daniel-j-h commented 4 years ago

Bringing back the 3d total variation loss term and scaling it

dream-tvn3-up

seems to get rid of high frequencies in the output which makes it more pleasant to look at :hugs:

direct comparison:

daniel-j-h commented 4 years ago

Changing how we normalize the gradients

primer1-up

daniel-j-h commented 4 years ago

Latest version maximizing all channels in layer2

primer1-up

Latest version maximizing channel 6 in layer2

primer1-up

daniel-j-h commented 4 years ago

Here are results for

which barely fits on one of my GTX 1080 TIs:

royal-wedding

royal-wedding

royal-wedding

Input clip from

youtube-dl -f 18 yJbXdOdTaJc

ffmpeg -i yJbXdOdTaJc.mp4 -ss 23:48 -to 23:53 -crf 23 -r 16 -an clips.mp4
daniel-j-h commented 4 years ago

Merging this into master as is right now. We can explore more advanced techniques such as dreaming at multiple scales and related in separate pull requests in the future.