Open Technologicat opened 1 year ago
Just for additional information, different SD GUIs use different ways to store the metadata.
For reference, there is a standalone SD prompt reader by receyuki that supports many of them, but it has no gallery mode.
EDIT: Also, the standalone SD prompt reader works as a workaround solution for my use cases.
If you associate it to images, you can use Pix as a gallery, then right-click an image in Pix, Open with...
, and pick the SD prompt reader. This opens a separate window that shows the SD metadata and allows easily copying it (or parts of it) to the clipboard.
In case anyone else needs this, here's a launcher file for the SD prompt reader. First, install the SD prompt reader (following the instructions in its README on GitHub).
Then save this as SD Prompt Reader.desktop
:
[Desktop Entry]
Name=SD Prompt Reader
Exec=python /home/user/full-path-to/stable-diffusion-prompt-reader/main.py %U
Comment=Extract prompts from Stable Diffusion generated images.
Terminal=false
Icon=/home/user/full-path-to/stable-diffusion-prompt-reader/resources/icon.png
Type=Application
Be sure to adjust the paths to match your installation.
Add this launcher to your Menu. Then, to associate it to images, Open with...
an image in Nemo, pick the SD prompt reader, and Add to list
.
Not quite as convenient as having the info available without opening a separate window each time, but works well enough for now.
Issue
Feature suggestion: metadata support for PNG
tEXt
chunks.Steps to reproduce
View a PNG image in Pix. Even if the PNG file contains
tEXt
chunks, the Properties pane does not see them.Expected behaviour
The Properties pane should list the PNG
tEXt
chunks, too.It should show their contents (right there in the pane), and allow copying the text (or parts of it) with the mouse, so that the text can be easily pasted into another app.
Other information
The would be highly useful with the Stable Diffusion AI art tool. Use cases:
New creations on a previous theme. The wording of the prompts that direct the AI often gets technical and lengthy, so to skip some prompt engineering when you want to make more AI art on a previous theme, it is essential to get the prompt you already engineered last time, to use as a starting point.
Currently, you can drag'n'drop a PNG into the SD GUI to use its metadata, but this requires multiple steps: first, locating the relevant image using the most convenient tool (Pix), then dropping the image into the PNG info tab in the SD GUI, and then copying the metadata from there. This is ok when you want to use all of the metadata - the SD GUI can parse the metadata and populate the generation parameters for you - but in the equally common use case where I already have set all other parameters the way I like, and just need the prompt or a part of it, I'd like to skip the middle step. Since I only need a piece of text, stored in the metadata of the PNG, I'd like to grab it right from the image viewer.
When training a LoRA (low-rank adaptation, a method to teach SD new concepts), I'd like to gather parts of prompts after a long session of manual testing, to produce an automated test suite (i.e. to facilitate batch-rendering the best tests the next time). The SD GUI has a script to batch render when you give it a list of prompts, but obviously, you need to write that list first. :)
This pretty much requires the agility of Pix, to find the relevant images in a plethora of outputs (often over a thousand), where the prompt was changed many times, at arbitrary points during the test session. I want to not worry about curating a collection of the best prompts while doing the actual testing, and gather the prompts in a separate step when done, from the mountain of data that was generated. Pix would be ideal for this.
There is an image browser for the Automatic1111 GUI for SD that already does these things, but usability-wise Pix is much better - an order of magnitude faster, many more images shown simultaneously, easy navigation between folders, much more efficient use of screen real estate, ...
SD saves its output as PNG, and puts the generation metadata into a
tEXt
chunk in a plain-text format. There may be approximately 1-2 kB of text, depending on how detailed the prompt and negative prompt were.Currently, the workflow for doing this with Pix is: quickly find the relevant image in Pix, then copy its file path, then in a terminal,
pngcheck -ct "<paste full path here>"
, and finally copy the text from the terminal window. Even though clumsy, this is still faster than doing it in the image browser in the SD GUI!I can supply some test PNG files, if needed.