lucidrains / big-sleep

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun
MIT License
2.56k stars 306 forks source link

Sample results, Tooling on top of big-sleep #13

Open enricoros opened 3 years ago

enricoros commented 3 years ago

big-sleep is GORGEOUS. We need to explore what it can do, where it shines, and what to avoid.

Adding a few pics down below, but I'm still in early experimentation - will update the thread later.

Puppies

« a colorful cartoon of a dog »

« a colorful cartoon of a dog with blue eyes and a heart »

Clouds

« clouds in the shape of a donut »

This post will be edited to add new samples

lucidrains commented 3 years ago

Heart for a nose! Lol

enricoros commented 3 years ago

Haha, it's creative indeed. Wanted to show the impact of seeds and iterations, for people that are puzzled by seeing completely different results. Added some more samples. Would like to figure out a list of nouns and modifiers/adjectives that work very well with big-sleep. For instance "made of" is used in the DALL-E example and seems to work very well here too.

lucidrains commented 3 years ago

@enricoros at this rate, we may not need DALL-E!

htoyryla commented 3 years ago

Wanted to show the impact of seeds and iterations, for people that are puzzled by seeing completely different results.

Thinking of me? Don't recognise myself, having worked as a visual artist working with neural networks since 2015 mainly largely writing my own code. Now learning about these new techniques, for eventually finding ways to integrate them into my own work processes.

Yes, there is something uncanny about this. From the prompt "a cityscape in the style of Lionel Feininger" I got this. Of course someone familiar with Feininger's work would see the differences, but that would miss the similarities. Like someone influenced by Feininger...

a_cityscape_in_the_style_of_Lionel_Feininger 3

"A cityscape in the style of Paul Klee". Again, not exactly Klee, but very much in the right direction. In fact, like someone influenced by Klee and Mark Rothko.

a_cityscape_in_the_style_of_Paul_Klee

I was never really interested in BigGAN, but working on my own GAN code and my own image materiaIs to keep to my own style. Now I am wondering, not perplexed, about how we are suddenly getting so much more interesting results from BigGAN. Is it that these pictures were always there, but difficult to find (I experimented once a little in latent space search in BigGAN but gave up). Or is it that using different conditioning vectors on different layers (as I see that the code does) increases the variety even further, even if the BigGAN is still using the same weights as before?

lucidrains commented 3 years ago

@htoyryla Very nice pictures!

I think the fabulous results we are seeing is from the unique combination of the multimodal network (CLIP) and the GAN. The GAN has been trained to be able to generate textures and objects to realism, so it has the capacity to paint anything it wants. All the knobs are there. CLIP helps to guide it based on the immense amount of images and text it has seen (400 million I believe). CLIP is also composed of attention, which in my mind is more powerful than the gradients you could get from convolutions.

The other thing that makes this combination special is that BigGAN is class conditioned, so it begins training at a random class as starting point. I believe the different starting points leads to a high variety of end results, even when rerunning on the same text.

htoyryla commented 3 years ago

@lucidrains Thanks for the clear explanation. Feels so obvious now :) Extremely interesting.

Your code projects here are an excellent resource for learning about these developments. Compact and keeping to the essentials, still fully working.

htoyryla commented 3 years ago

The GAN has been trained to be able to generate textures and objects to realism, so it has the capacity to paint anything it wants. All the knobs are there.

I am still thinking of how the control from CLIP to BigGAN is implemented. Where the knobs are, so to say. It does not appear to be simply through a latent into the first layer but injection into multiple layers (which makes sense to me as different layers control different visual features).

No need for a long explanation though, I can go on to investigate on my own.

enricoros commented 3 years ago

Love the artistic direction of this thread. My interest (other than reading the answer to @htoyryla's question), is in the tooling on top of this beautiful technology that can allow for artistic control, save/restore/collaboration, and to not waste computing resources (and precious time!) running notebook for hours for a single image which is then discarded.

I'm summarizing my ideas for the transition in the "usability" of generative technologies in this picture: image

What I'm not mentioning here is the plan for "the day after", which could use trained networks to replace the manual selection process, and weed out automatically pictures that are straight out garbage (we see many :).

This would require API changes to make the library more controllable, executable in a step-by-step fashion, make latent space restorable/saveable (instead of starting from an initial seed and crossing fingers), not to mention then going into editing of the latent space from an UI (point, interpolate, etc). Am I going too far off the deep end? :)

htoyryla commented 3 years ago

This would require API changes to make the library more controllable, executable in a step-by-step fashion, make latent space restorable/saveable (instead of starting from an initial seed and crossing fingers), not to mention then going into editing of the latent space from an UI (point, interpolate, etc). Am I going too far off the deep end? :)

I already tried saving the latents together with each intermediate image, and then made a separate script for generating images by interpolating between two stored latents. No problems with that, worked nicely.

enricoros commented 3 years ago

Please share the code! Open a pull request, or fork the repo and add. @htoyryla: what are your creation flows, using this tech?

htoyryla commented 3 years ago

Please share the code! Open a pull request, or fork the repo and add. @htoyryla: what are your creation flows, using this tech?

My experiment was based on an earlier version, so I will make a new fork, make the necessary changes, test and let you know. Nothing fancy, just how I did it.

My workflow in art is based on my own GAN, with lots of options, my own image sets, usually quite small and focused to limit the visual world. In addition I use other tools, such as pix2pix and the like, to modify images. Here I am simply getting familiar with these new technological options.

htoyryla commented 3 years ago

See here https://github.com/htoyryla/big-sleep . It will store latents in a pth file (named similar to the image) when save_progress is used.

The lines for storing are here https://github.com/htoyryla/big-sleep/blob/472699165a4d792f0837239836e7e5a1f45dcd88/big_sleep/big_sleep.py#L243-L246

bsmorph.py shows then that the latents can be loaded and that it is possible to interpolate between them. The interpolation in bsmorph.py is very crude, feel free to use your own.

There is nothing yet for continuing training from stored latents, but it should be straightforward to initialise latents from a stored one. Use lats = torch.load(filename) to read latents from a file and then initialise the latents with lats.normu and lats.cls here https://github.com/lucidrains/big-sleep/blob/a7ad18c873797a0a9b0707ec0cadb92549b0a382/big_sleep/big_sleep.py#L72-L73

htoyryla commented 3 years ago

Here's a morph between two latents I stored:

https://user-images.githubusercontent.com/15064373/105577466-8006e800-5d82-11eb-8e65-e843bf5c241b.mp4

enricoros commented 3 years ago

Beautiful. Can't wait to learn from your code.

htoyryla commented 3 years ago

Beautiful. Can't wait to learn from your code.

Did you notice my comment about the code above?

TheodoreGalanos commented 3 years ago

Love the artistic direction of this thread. My interest (other than reading the answer to @htoyryla's question), is in the tooling on top of this beautiful technology that can allow for artistic control, save/restore/collaboration, and to not waste computing resources (and precious time!) running notebook for hours for a single image which is then discarded.

I'm summarizing my ideas for the transition in the "usability" of generative technologies in this picture: image

What I'm not mentioning here is the plan for "the day after", which could use trained networks to replace the manual selection process, and weed out automatically pictures that are straight out garbage (we see many :).

This would require API changes to make the library more controllable, executable in a step-by-step fashion, make latent space restorable/saveable (instead of starting from an initial seed and crossing fingers), not to mention then going into editing of the latent space from an UI (point, interpolate, etc). Am I going too far off the deep end? :)

this is a very important discussion and right at the heart of my research. There are ofc many, many issues to be solved yet but the last couple of months even huge steps towards generative design workflows have been made.

I try to stay a bit sober however because as opposed to many wonderful users of these new workflows I'm not an artist. I'm an engineer and a designer so operationalizing this things under constraints of real world projects is a huge task. Thankfully, that is another area I feel has had some very important works come out in the last few months.

Such an exciting few years we're entering!

htoyryla commented 3 years ago

I try to stay a bit sober however because as opposed to many wonderful users of these new workflows I'm not an artist. I'm an engineer and a designer so operationalizing this things under constraints of real world projects is a huge task.

I am both. I worked decades in the development of specialised mobile networks, at times mediating between the customer and the actual development.

Currently, my approach to coding is to proceed in small steps. Experiments and enhancements that can be implemented in a single day. In the long run, it can still go far enough.

indiv0 commented 3 years ago

« a colorful cartoon of a dog »

* `seed=553905700049900, iteration=160, lr=0.7, size=256`

Perhaps I'm mistaken, but with lr=0.7 I'm getting completely wrong results (e.g. fully white or green images). Is this supposed to be lr=0.07 instead? I'm pretty new to all of this.

enricoros commented 3 years ago

@indiv0 GOOD CATCH! Updating the post with .07

enricoros commented 3 years ago

@lucidrains I'm experimenting with a UI for human-in-the-loop (@TheodoreGalanos).

Example of a few-hours of coding. Not connected with a backend. I want to have the backend remote, so I can run it on a headless Linux box with a more powerful GPU while viewing the results from my less powerful machine.

image Would you be open to a few API changes to enable this sort of operation? mainly updating (some) hyper params, saving/restoring latent state, and getting the png buffer instead of saving it to disk.

lucidrains commented 3 years ago

@enricoros I'm all ears :) Just let me know how you would envision the API and I'll put in some time later this week!

indiv0 commented 3 years ago

If you guys need any help with this, just point me at an issue. I’m super new to ML but I’ve got some backend experience and I’d love to help out where I can, especially with @enricoros’ UI.

TheodoreGalanos commented 3 years ago

Nice job @enricoros ! This is a great start. I wonder can we use generated images as seeds for another generation with deep sleep? or that is too constrictive? Interactive (latent) evolution would preferrably happen like that although I can definitely see this as a sort of 1-loop run and at the end of multiple runs you have a basket of candidates to work with.

TheodoreGalanos commented 3 years ago

@enricoros I'm all ears :) Just let me know how you would envision the API and I'll put in some time later this week!

this might sound silly but is a hugginface-like API viable for this things?

enricoros commented 3 years ago

Nice job @enricoros ! This is a great start. I wonder can we use generated images as seeds for another generation with deep sleep? or that is too constrictive? Interactive (latent) evolution would preferrably happen like that although I can definitely see this as a sort of 1-loop run and at the end of multiple runs you have a basket of candidates to work with.

@TheodoreGalanos That's one of the options I want to enable. You could ideally continue the generation from the same hyperparams+latents, or even steer a new generation towards a different prompt - or even cross-pollinate latents and such.

this might sound silly but is a hugginface-like API viable for this things?

@TheodoreGalanos how would that API look like?

@lucidrains Thanks for volunteering :D, I'll keep you posted. At the moment I've added socket.io (websockets) support to a different cmdline util which uses Imagine() and I'm fighting off long blocking calls vs threaded execution of the websocket event loop.

@indiv0 If you have python experience, some experimental code is on https://github.com/enricoros/big-sleep-creator/blob/main/creator.py - I need to send flask-socket.io messages even while running long blocking operations (see line 126), so that the websocket doesn't disconnect from the UI. I can either block all (including socket messages) until an operation is complete, or execute everything in parallel (which parallelizes Image generation, which crashes the server). I don't have any experience here, let me know if you spot any mistake.

This is the current progress github.com/enricoros/big-sleep-creator: image

Compared to the last update, now the WebApp connects to the big-sleep python process on the same or different machine (see the GPU info, top-right), and can sync status and run a generation operation, "imagine()". No results are retrieved yet, as I need PNG buffers to send back to the UI instead of files written to disk. I will have more progress towards the end of the week.

indiv0 commented 3 years ago

@enricoros Looks awesome! The UI looks like exactly what I'd want, personally. I'm working on something similar myself here: http://ec2-34-215-137-20.us-west-2.compute.amazonaws.com/ https://dank.xyz/ except mine is intended to run without needing colab.

I'll take a look at the websocket stuff.

enricoros commented 3 years ago

@enricoros Looks awesome! The UI looks like exactly what I'd want, personally. I'm working on something similar myself here: https://dank.xyz/ except mine is intended to run without needing colab.

Looks really amazing, have you shared this link with people yet? I love the quality of the generated results. My idea is to be able to see and edit the 'dreams' while they are happening, to select the best ones and suppress the weird :)

indiv0 commented 3 years ago

Yeah I've shared it a little bit. Almost all of the submissions are from users, not me.

I absolutely agree with you. The human-in-the-loop functionality is critical. I plan to add an account system so that users can terminate/re-run their renders and get the results they want.

lucidrains commented 3 years ago

@enricoros Looks awesome! The UI looks like exactly what I'd want, personally. I'm working on something similar myself here: ~http://ec2-34-215-137-20.us-west-2.compute.amazonaws.com/~ https://dank.xyz/ except mine is intended to run without needing colab.

I'll take a look at the websocket stuff.

great job! this is all that i hoped would happen ;) stop-gap measure before we all have an imagination machine in our living room :D

when we finally replicate DALL-E, the internet is going to explode :D

lucidrains commented 3 years ago

@indiv0 some suggestions (1) have Big Sleep generated up to N candidate images and have viewers vote on which one is the best (2) comments, disqus or home built (would be hilarious)

lucidrains commented 3 years ago

@indiv0 are you doing anything special for the site? or is it mostly all just run with the default settings?

indiv0 commented 3 years ago

@lucidrains For sure. Giving users more control over selecting optimal images is an important feature and would greatly help users generate good results.

Currently I'm running each query with 75 iterations for 7 epochs with a learning rate of 0.06.]

I can't WAIT until we can replicate DALL-E. You're absolutely right. Near real-time DALL-E will be an absolute game changer for creative expression online. In the meantime I'm going to work on adding extra models to the site (like deep-daze) and giving users more control over their renders.

Right now the limiting factor is actually the speed of the model. At 8 minutes per render the queue just keeps growing (I can't process requests fast enough) and I don't have infinite money to spend on GPUs so if we can think of any way to speed it up that'd be a huge win.

nerdyrodent commented 3 years ago

Does anyone have any ideas on saving / loading the latents with the currently release (0.7.0)? If I load the saved latents like this:

noise1 = lat1.model.normu.to(device)
class1 = lat1.model.cls.to(device)

then I just get "raw" BigGAN images (mostly dogs), rather than the dream image?

Update: Oh, I see. Now need to save like this: lats = self.model.model.latents.cpu()

KiudLyrl commented 3 years ago

Hi, saving/restoring the latents does not seem to be enought.

It seems that big_sleep is a lot more agressive during the first iterations. I think it has something to do with ema_decay but I'm not sure.

Do you guys have an idea? Thanks

wolfgangmeyers commented 3 years ago

Hi, saving/restoring the latents does not seem to be enought.

It seems that big_sleep is a lot more agressive during the first iterations. I think it has something to do with ema_decay but I'm not sure.

Do you guys have an idea? Thanks

I think you might be right. I looked at the EMA class at https://github.com/lucidrains/big-sleep/blob/main/big_sleep/ema.py#L16

After each iteration the accum value is multiplied by the ema_decay, but it is always initialized to 1. If the EMA constructor accepted initial accum value, I think it could either be saved or recalculated based on the current iteration. If I have time before your PR gets merged I may test it :)

wolfgangmeyers commented 3 years ago

Hi, saving/restoring the latents does not seem to be enought. It seems that big_sleep is a lot more agressive during the first iterations. I think it has something to do with ema_decay but I'm not sure. Do you guys have an idea? Thanks

I think you might be right. I looked at the EMA class at https://github.com/lucidrains/big-sleep/blob/main/big_sleep/ema.py#L16

After each iteration the accum value is multiplied by the ema_decay, but it is always initialized to 1. If the EMA constructor accepted initial accum value, I think it could either be saved or recalculated based on the current iteration. If I have time before your PR gets merged I may test it :)

I've set the initial ema value to very low numbers and it doesn't affect the rate of change in the beginning. The only thing I've tweaked that seems to affect it is the learning rate, and the only consumer of that is the Adam optimizer. But setting the lr parameter to a low value does cause the rate of change to be slower, but it still tapers down over time. I think this is a behavior built into the Adam optimizer based on https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/ - but the docs are mostly beyond my understanding.

KiudLyrl commented 3 years ago

yeah changing accum does not work, I did a for loop that call 350 times the update function (my latent is from epoch 0, iteration 350) in the constructor of EMA but it changed the picture (darker, a little bit different) so there is something to do here

KiudLyrl commented 3 years ago

@wolfgangmeyers could you check https://github.com/lucidrains/big-sleep/pull/86 It seems to be working fine.

I was a bit agressive, I dumped the whole EMA/ADAM objects to disk and restores them

wolfgangmeyers commented 3 years ago

@wolfgangmeyers could you check #86 It seems to be working fine.

I was a bit agressive, I dumped the whole EMA/ADAM objects to disk and restores them

I was able to get it working - I think this is perfect for generating a large number of images quickly and then picking which ones to finish. I left some feedback on your PR, but I don't have permission to approve it :)

rafaelpuga commented 3 years ago

Hello! Does anyone know what Seed stands for? I can just use random numbers? I have the same doubt for iterations and learning rate. I keep using random numbers, but dont know what they mean. Anyone can help? :)