lucidrains / deep-daze

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun
MIT License
4.37k stars 327 forks source link

Idea: Adding a strength modifier when using img and text #136

Open nerdyrodent opened 3 years ago

nerdyrodent commented 3 years ago

Hi, so... not really an issue, but hey - not sure where to add ideas? Anyway, I thought I'd mention something I've found useful, so maybe you'd like to add it in too? Basically, when using both text and images I found that there wasn't as much variety in the results. "Wouldn't it be nice if I could put more emphasis on the text over the image?", I thought. So I added some really basic strength multipliers as a test:

encoding = ((self.create_text_encoding(text) * txt_str) + (self.create_img_encoding(img) * img_str)) / 2

Astonishingly, not only did it not produce an error, it actually seemed to work :) After a bit of testing, I found having a 2:1 strength on text vs image produced much more variable output, and still had the img as a guide. Probably not the greatest way of doing things, but it did the trick with regards to what I was trying to achieve visually!

Another idea could be an option to add more than 1 image (like where they combine images in the paper)?