mehdidc / feed_forward_vqgan_clip

Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt
MIT License
136 stars 18 forks source link

VQGAN - blended models #12

Closed johndpope closed 3 years ago

johndpope commented 3 years ago

I want to take a film (say the Shining )

Is it possible? In the nerdyrodent/VQGAN-CLIP repo - there's a style transfer

@norod + @justinpinkney were successful in blending models together (the FFHQ + cartoon designs) which could easily - could it be achieved in this VQGAN domain? They kind of perform some neural surgery / hacking the layers to force the results. https://github.com/justinpinkney/toonify

Does the VQGAN give us some access to hack these layers?

UPDATE @JCBrouwer - seems to have a combined a style transfer via video here https://github.com/JCBrouwer/maua-style

fyi @nerdyrodent

JCBrouwer commented 3 years ago

The video style transfer in my repo is optimization based, not feed-forward.

Fast style transfer is on my radar, though, definitely would like to add it in as well.

I have a bunch more changes to clip video style in a private branch right now as I've integrated some things from latent visions which is still in license purgatory at the moment.

afiaka87 commented 3 years ago

I want to take a film (say the Shining )

* caption it using amazon ai label detection (maybe 1 every 100 frames)

* throw these image + text paris into training -

* then take trained model have the neural nets spit out something in the style of the movie....

Is it possible? In the nerdyrodent/VQGAN-CLIP repo - there's a style transfer

* but I'm in an enquiry of how to merge the model layers so that the content is skewed to a certain style / astethic.

@Norod + @justinpinkney were successful in blending models together (the FFHQ + cartoon designs) which could easily - could it be achieved in this VQGAN domain? They kind of perform some neural surgery / hacking the layers to force the results. https://github.com/justinpinkney/toonify

Does the VQGAN give us some access to hack these layers?

UPDATE @JCBrouwer - seems to have a combined a style transfer via video here https://github.com/JCBrouwer/maua-style

fyi @nerdyrodent

I see what you're after now - you'd like a feed_forward_vqgan_clip vitgan that tends to output "The Shining"-style images.With this you could generate your image much faster. Is that accurate?

For now at least - this code only needs captions. The style transfer abilities are always going to be limited by CLIP itself in this instance.

All you need is a bunch of captions engineered specifically to pull out "The Shining style" from CLIP. This should be relatively easy - for example; I prepended the word "minimalism" to every prompt from the "blog post" captions; and the style was drastically changed for the ViTGAN i was training. Have you tried prepending "The Shining", "The Shining by Stanley Kubrick", etc?

johndpope commented 3 years ago

I posed this question to open_clip repo - https://github.com/mlfoundations/open_clip/issues/1

Here's my central-park-stanley-kubrick-wide-angle-landscape.png image