pytti-tools / pytti-core

https://pytti-tools.github.io/pytti-book/intro.html
MIT License
79 stars 23 forks source link

Comprehension: An upgrade to the Direct Image Prompts system #129

Closed bridgebrain closed 2 years ago

bridgebrain commented 2 years ago

I've been thinking about a module which could make direct images much stronger. As it is, they only seem to have a small effect, and get buried if you use a collection of them.

Comprehension could be a seperate notebook or a seperate run module in pytti and disco. It would take the folder you point it at (which is filled with images in a specific theme, and a text file containing all the tags you want to highlight in that theme) and encodes it. I don't know whether the init image encoder run on each image would work, or whether that'd create too much overhead. If so, then running the same training system that creates RN and VitB, and creating a mini-model.

The user then has access to the encoded/trained output file, just like a specialized model. They could then create a collection of models, and combine them in unique ways. (For instance, create a model called Particles, one called Magic, and one called Industrial, then through the combination of prompts and models to create a detailed image of wizards fighting in a warehouse using the specific magic appearance the user wants)

dmarx commented 2 years ago

yeah, I'm a fan! You're actually touching on an interesting bigger question here, which is giving users tooling to help them curate a more personalized aesthetic. The approach you're describing here is closely related to ideas I've seen referred to as augmenting prompts with "vitamins", i.e. supplemental prompts with desirable properties (at least I think that's what people mean when they talk about "vitamins" in an AI art context). Another related idea I've seen -- which I'm pretty sure is actually implemented in a notebook somewhere, I'll try to dig it up -- is to prompt the user to rate generated outputs and train a classifier on a database of user-scored embeddings. I like the idea of constructing some sort of generic toolkit here to help users define their voice in a way that could be compatible with a variety of tools.

Let's use this thread to brainstorm further and try to scope out what a stand-alone component might look like, and then we can try to plug it into pytti as part of the proof concept.

dmarx commented 2 years ago

calling this a dupe of #94