yownas / shift-attention

In stable diffusion, generate a sequence of images shifting attention in the prompt.
Other
166 stars 17 forks source link

Including generation data information for output images #12

Closed MoonRide303 closed 1 year ago

MoonRide303 commented 1 year ago

Hi there.

I've found this extension usable for finding optimal weight values, BUT currently it's pretty hard to identify what exact value of parameter it was for particular image (especially when it's like 10 or more images). It would be useful to have full generation data (or at least calculated value(s) of parameter(s) being changed) somewhere, in a way that would allow to associate it with the output image. Maybe it would be possible to simply include that information in the log files (shift-attention-info-XXXXX.txt)?

yownas commented 1 year ago

That is not a bad idea. Things can get messy, especially if you are using SSIM to let the script fill in frames and they are generated out of order.

But I'm not sure what the best way to do it would be. Things you'd want to have is the "frame number" (where in the animation/range the picture is), the number when it was generated so you can match it to an image (if you have it saved) and then the entire prompt/negative prompt? Something like this? Maybe it would have been nice to get it in a csv-like format but tricky since the prompt can include newlines and other weird characters? (json would also be nice, but harder to read for humans.)

---
0,0000
+ long and complicated (prompt:0.0003)
- (negative:1.0) prompt
---
1,0001
+ long and complicated (prompt:0.0004)
- (negative:1.1) prompt 
MoonRide303 commented 1 year ago

For me it would be OK to have anything that would allow me to quickly know calculated parameter value that was used to generate given image file. Currently when it's like 10 or more files, and values are not perfectly round, I often don't really know what parameter value was used - and I need exact values to be able to generate the same image again with hi-res fix, or to continue looking for better value in smaller range & with smaller steps.

So... as long as I will be able to decode prompt used for given image, any form would do - whatever looks most reasonable from your point of view.

No idea about SSIM - I've never used it.

yownas commented 1 year ago

I'll hack something together after work. If you have any idea for a good format I'd be happy to try it. :)

SSIM stand for "Structual Similarity Index Metric", usually used to measure quality degradation in compression and transmission. But I (ab)use it to make the script generate images until they are similar enough. It is nice to use sometimes, since the change from 0 to 0.7 might not be that great, and then all the change happen between 0.7 to 1. Using SSIM it would generate more images at the end to capture more detail. Downside is that you never know beforehand how many frames you'll end up with.

yownas commented 1 year ago

Think I managed to got something working.

Since images can be generated out of order there is a difference between frame-number and image-number. Image-number should match the number used by the webui when it saves a preview, and also match the index when/if you use the api.

--- Frame: <frame in the animation> Image: <image number> Seed: <seed>
+ <prompt>
- <negative prompt>