jhc13 / taggui

Tag manager and captioner for image datasets
GNU General Public License v3.0
495 stars 26 forks source link

[feature request] History of prompts #171

Open geroldmeisinger opened 1 month ago

geroldmeisinger commented 1 month ago

doesn't have to be per image, but at least a global history would be nice. and also saves settings.

geroldmeisinger commented 1 month ago

I would strongly argue for a history.jsonl file in the image directory BY DEFAULT as it provides additional infos for any published image datasets on how the prompts came to be and which specific settings were used. if taggui catches on and more and more image datasets are published with this file, we get better insights. just like ComfyUI included workflows in every image. users who don't like it can "opt-out" by just deleting the file anytime.

geroldmeisinger commented 1 month ago

some proposals for discussion:

process-centric

everytime you press start caption, a new entry is added with all the captioning settings, parameters, meta data and image selections:

[
{
    date: datetime # 2023-12-31 12:34:56
    # meta
    model_id: str # THUDM/cogvlm2-llama3-chat-19B-int4
    model_hash: str # e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
    taggui_rev: str # 8d43352fa0eab65586108c806fdd80e8da5012c5
    # captioning
    prompt: str # "Can you please describe this image in up to two paragraphs?"
    caption_start: str # "This image showcases"
    ...
    # parameters
    min_new_tokens: int
    max_new_tokens: int
    ...
    # selection
    images: list[str] # [ "00000/000000001.jpg", "00000/000000005.jpg", ... ]
    # or
    images: tree[str] # [ "00000": [ "000000001.jpg", "000000005.jpg", ...], "00001": [...] ]
},
{
    date: datetime # 2023-12-31 12:00:00
    ...
}
]

pros:

cons:

selection-centric

everytime the image selection changes, a new entry is added. every caption process just append nested in captionings

[
{
  # selection
  images: tree[str] # [ "00000": [ "000000001.jpg", "000000005.jpg", ...], "00001": [...] ]
  # processes
  captionings:
    [
      {  
        date: datetime # 2023-12-31 12:34:56
        # meta
        model_id: str # THUDM/cogvlm2-llama3-chat-19B-int4
        model_hash: str # e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
        taggui_rev: str # 8d43352fa0eab65586108c806fdd80e8da5012c5
        # captioning
        prompt: str # "Can you please describe this image in up to two paragraphs?"
        caption_start: str # "This image showcases"
        ...
        # parameters
        min_new_tokens: int
        max_new_tokens: int
        ...
      },
     {
          date: datetime # 2023-12-31 12:00:00
          ...
      }
    ]
},
{ # add different image selection
...
}
]

pros:

cons:

image-centric

just save a .json next to the captions .txt and load last settings from first image selected

[
{
    date: datetime # 2023-12-31 12:34:56
    # meta
    model_id: str # THUDM/cogvlm2-llama3-chat-19B-int4
    model_hash: str # e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
    taggui_rev: str # 8d43352fa0eab65586108c806fdd80e8da5012c5
    # captioning
    prompt: str # "Can you please describe this image in up to two paragraphs?"
    caption_start: str # "This image showcases"
    ...
    # parameters
    min_new_tokens: int
    max_new_tokens: int
    ...
},
{
    date: datetime # 2023-12-31 12:00:00
    ...
}
]

pros:

cons:

Typical Workflows?

a) first you make a small selection and try a lot of prompts. once you have the ideal prompt, you select all and caption all one time. => prefer process-centric