lucidrains / DALLE2-pytorch

Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
MIT License
11.03k stars 1.07k forks source link

Generating Multiple Outputs #146

Closed egeozsoy closed 2 years ago

egeozsoy commented 2 years ago

Is there a suggested way of generating multiple image outputs given a text input, like the 9-10 outputs dalle2 produces?

lucidrains commented 2 years ago

@egeozsoy you could do it yourself easily by images = dalle2(['your prompt'] * num_outputs)

lucidrains commented 2 years ago

@egeozsoy i'm guessing you have trained something decent?

egeozsoy commented 2 years ago

At least something that gives me some results haha. That sounds like a very easy solution, will try it out

lucidrains commented 2 years ago

@egeozsoy do share your results, if you can :smiley:

lucidrains commented 2 years ago

but knowing researchers, you are probably scrambling for a paper to be written / avoid being scooped, so hearing silence is good enough signal for me haha

egeozsoy commented 2 years ago

I am working on a very specific topic, surgical video generation. So the results are maybe not very interesting. But once I have something that works (right now, I think I am getting overfitting problems a lot), I will share :)

lucidrains commented 2 years ago

@egeozsoy nice! surgical video is very interesting! laparoscopy?

lucidrains commented 2 years ago

ok, you go do your thing

egeozsoy commented 2 years ago

For now external view. We already have a paper out about the dataset https://arxiv.org/abs/2203.11937 if you want to get a feel for the setting

lucidrains commented 2 years ago

@egeozsoy very cool! and yeah 7k images isn't a lot of data

but the new Tero Karras paper https://arxiv.org/abs/2206.00364 brings in some augmentation tricks that may benefit the small data setting. i'll eventually build that into this repository and some other DDPM repositories once i see some positive results

egeozsoy commented 2 years ago

Thanks for the suggestions, It is really about testing the limited data regime for dalle2 for now. Interested in seeing how it performs with 1000x less data :)