jaketae / storyteller

Multimodal AI Story Teller, built with Stable Diffusion, GPT, and neural text-to-speech
MIT License
482 stars 64 forks source link

How to keep the consistency between images? #11

Closed SerendipitysX closed 1 year ago

SerendipitysX commented 1 year ago

This is a wonderful work that meets my best need! I have a little problem and cusious how you cool guys do it: The gif in read.me is fluent, smooth, with cosistent style. So, what's the point of doing that?

jaketae commented 1 year ago

Hello @SerendipitysX, thanks for opening this issue.

The consistency is not algorithmic, and it's more based on luck from prompting the model. Currently, there is a parameter called "painter_prompt_prefix" that gets added to every sentence Stable Diffusion draws. For instance, the sentences

Once upon a time, unicorns roamed the Earth. The unicorns had long hair and sharp horns....

would get split into

Beautiful painting: Once upon a time, unicorns roamed the Earth Beautiful painting: The unicorns had long hair and sharp horns. Beautiful painting: ...

So prefixing each prompt to Stable Diffusion is likely what's causing it to have some sort of consistent style. For more reliable consistency, however, we would have to pass the previous output as a condition to the model for generating the next frame. I'm not planning on implementing this just yet, but it's on my list of long-term to-do's. Hope this helped!

jaketae commented 1 year ago

Closing this for now. Feel free to reopen if you have any follow-ups!