lucidrains / deep-daze

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun
MIT License
4.38k stars 325 forks source link

create a story in the Colab notebook #95

Open itsHNTR opened 3 years ago

itsHNTR commented 3 years ago

Can you run create a story in the Colab notebook?

itsHNTR commented 3 years ago

also can you prime an image in the notebook?

asigalov61 commented 3 years ago

@itsHNTR Yes, I think both of those things are possible. For some reason the install package via pip is more updated, so here is the latest (as of 03/17/2021) function options:

So you want to try img/start image and also the story options.

class Imagine(nn.Module):
    def __init__(
            self,
            *,
            text=None,
            img=None,
            clip_encoding=None,
            lr=1e-5,
            batch_size=4,
            gradient_accumulate_every=4,
            save_every=100,
            image_width=512,
            num_layers=16,
            epochs=20,
            iterations=1050,
            save_progress=True,
            seed=None,
            open_folder=True,
            save_date_time=False,
            start_image_path=None,
            start_image_train_iters=10,
            start_image_lr=3e-4,
            theta_initial=None,
            theta_hidden=None,
            lower_bound_cutout=0.1, # should be smaller than 0.8
            upper_bound_cutout=1.0,
            saturate_bound=False,
            create_story=False,
            story_start_words=5,
            story_words_per_epoch=5,
    ):
asigalov61 commented 3 years ago

@lucidrains and other contributors...

Can you implement line-by-line story creation as opposed to word-by-word? I think line by line will be much better suited for story creation as most texts are formatted by lines, not by words.

Just a humble suggestion.

Thank you, guys! You did an awesome job with deep-daze! Love it :)

NotNANtoN commented 3 years ago

@asigalov61 Thanks for the feedback. I currently shifted my focus and will not continue the create_story feature for now.

I think your approach makes sense, so feel free to add it. One could then switch the create_story_mode from words to line. Feel free to do a PR. Mainly the update_story_encoding function would need to be changed. The issue that I see is that a single line could be too long to fit into the context length of the CLIP model - but I guess that could be checked at the start and raise an Exception if it is the case.

If you do not want to do a PR, you can also just import deepdaze and then in your code write a loop that repeatedly calls model.train_step(). Then you can use model.set_clip_encoding() to change the encoding however you like.

asigalov61 commented 3 years ago

@NotNANtoN Np, bro! Too bad you do not have time for this project...but I can take a look at the code and see if I can do it if I have a minute...its all about time now...right? picoseconds? ;)

And thank you for the points. I know that a sentence can be longer than what CLIP can process, but as you said yourself there are many ways around it. I'd even say that we can use a tiny AI summarizer to make sentences smaller. j/k ;)

But seriously, I will see if I can do a PR but the code may be too complex for me though. I skimmed through the code and I could not do it right away. So give me time, please.

Otherwise, thank you again for deep-daze. Its totally awesome :)

asigalov61 commented 3 years ago

@NotNANtoN Btw, take a look at my GitHub. I am basically trying to create a deep-daze for music. So if you are curious about such a thing, I would love to hear at least your thoughts on how it would be possible to do it.

asigalov61 commented 3 years ago

Holodeck can't be w/o sound and music. Otherwise, it will be a "silent holodeck".

NotNANtoN commented 3 years ago

@asigalov61 don't give me too much credit, I just added some features to deep daze, such as the create_story etc. The base idea is from advadnoun (twitter) and lucidrain and others coded this repo.

I'm definitely interested in you deep-daze for music - please link the repo, I don't want to search through your hundred repos. I am also planning to embed a CLIP-guided music visualization into the lucid-sonic-dreams repo, if you've heard of it.

asigalov61 commented 3 years ago

@NotNANtoN Just take the compliment, bro! This stuff takes mutual effort cuz it's pretty complex. You guys did a great job!

Thanks for your interest in music and I am sorry you had difficulty navigating in my GitHub. I put my best and latest on my profile page. You probably want to check out Optimus-VIRTUOSO and Markovfy-Piano. They turned out very nice IMHO.

https://github.com/asigalov61/Optimus-VIRTUOSO

https://github.com/asigalov61/Markovify-Piano

I also have Karaoke and everything in between. So try it out and let me know what you think. I have nice Colabs in all my repos so you should be able to try it very quickly and easily.

Let me know if you need any help/guidance with my software/code. Would be happy to help.

Alex

asigalov61 commented 3 years ago

@NotNANtoN lucid-sonic-dreams looks like another awesome idea but I think the main point here is make it meaningful. In other words, it has to be conditioned like deep-daze on the user's input. Then it really would be awesome. But still pretty impressive. I will look into it in detail when I have a minute.

NotNANtoN commented 3 years ago

@asigalov61 The samples generated by Optimus-Virtuoso sound nice indeed. As there is no explanation in the README I assume it is a GPT2 model trained on MIDI data and not CLIP-related?

Yes, lucid-sonic-dreams is at the moment just an interesting way to explore the latent space of a model. But my plan is to combine it with a pre-trained model from https://github.com/Spijkervet/CLMR and CLIP to make it "meaningful". Even without CLMR, it could be made meaningful in songs with lyrics by just using CLIP.

asigalov61 commented 3 years ago

@NotNANtoN Thank you for complimenting my work. It means a lot to me. And I apologize for the lack of the description but I am sorta one-man operation here so writing it all up/writing docs always suffers. This is why I provide Google Colabs with some notes but I know that not all have time to try it and go through it...

Yes, OV is an actual GPT2 implementation with GPT3 tweaks (vanilla minGPT+some tweaks of my own) and yes, it was trained on MIDIs as it is the way to go IMHO. GPT2 shows amazing results, unmatched even by Reformer or other current(!) SOTA transformer arch. so this is why GPT2+ as I like to say.

One can only imagine what big guys have, right? And how it could play music or generate images, right? ;)

RE: your idea: Yes, way to go, man! Sounds like a great plan! Is there a repo for the project I can follow? And also let me know if I can help in any way, as I would love to contribute to something like that.