[feature request] History of prompts

geroldmeisinger commented 1 month ago

doesn't have to be per image, but at least a global history would be nice. and also saves settings.

geroldmeisinger commented 1 month ago

I would strongly argue for a history.jsonl file in the image directory BY DEFAULT as it provides additional infos for any published image datasets on how the prompts came to be and which specific settings were used. if taggui catches on and more and more image datasets are published with this file, we get better insights. just like ComfyUI included workflows in every image. users who don't like it can "opt-out" by just deleting the file anytime.

geroldmeisinger commented 1 month ago

some proposals for discussion:

process-centric

everytime you press start caption, a new entry is added with all the captioning settings, parameters, meta data and image selections:

[
{
    date: datetime # 2023-12-31 12:34:56
    # meta
    model_id: str # THUDM/cogvlm2-llama3-chat-19B-int4
    model_hash: str # e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
    taggui_rev: str # 8d43352fa0eab65586108c806fdd80e8da5012c5
    # captioning
    prompt: str # "Can you please describe this image in up to two paragraphs?"
    caption_start: str # "This image showcases"
    ...
    # parameters
    min_new_tokens: int
    max_new_tokens: int
    ...
    # selection
    images: list[str] # [ "00000/000000001.jpg", "00000/000000005.jpg", ... ]
    # or
    images: tree[str] # [ "00000": [ "000000001.jpg", "000000005.jpg", ...], "00001": [...] ]
},
{
    date: datetime # 2023-12-31 12:00:00
    ...
}
]

pros:

just one file
easy to append
easy to load last settings
easy to remove unwanted informations (only use last one)

cons:

saving image selection EVERYTIME might be huge (20 chars * 100k ~ 2MB array everytime the full 100k image dataset is captioned)
has to parse all entries to get all valid entries for selected image(s)

selection-centric

everytime the image selection changes, a new entry is added. every caption process just append nested in captionings

[
{
  # selection
  images: tree[str] # [ "00000": [ "000000001.jpg", "000000005.jpg", ...], "00001": [...] ]
  # processes
  captionings:
    [
      {  
        date: datetime # 2023-12-31 12:34:56
        # meta
        model_id: str # THUDM/cogvlm2-llama3-chat-19B-int4
        model_hash: str # e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
        taggui_rev: str # 8d43352fa0eab65586108c806fdd80e8da5012c5
        # captioning
        prompt: str # "Can you please describe this image in up to two paragraphs?"
        caption_start: str # "This image showcases"
        ...
        # parameters
        min_new_tokens: int
        max_new_tokens: int
        ...
      },
     {
          date: datetime # 2023-12-31 12:00:00
          ...
      }
    ]
},
{ # add different image selection
...
}
]

pros:

just one file
easy to load last setting

cons:

weird
size should not be an issue

image-centric

just save a .json next to the captions .txt and load last settings from first image selected

[
{
    date: datetime # 2023-12-31 12:34:56
    # meta
    model_id: str # THUDM/cogvlm2-llama3-chat-19B-int4
    model_hash: str # e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
    taggui_rev: str # 8d43352fa0eab65586108c806fdd80e8da5012c5
    # captioning
    prompt: str # "Can you please describe this image in up to two paragraphs?"
    caption_start: str # "This image showcases"
    ...
    # parameters
    min_new_tokens: int
    max_new_tokens: int
    ...
},
{
    date: datetime # 2023-12-31 12:00:00
    ...
}
]

pros:

easily transferable

cons:

lots of files
no overview, has to look into every file to see if a different setting has been used

Typical Workflows?

a) first you make a small selection and try a lot of prompts. once you have the ideal prompt, you select all and caption all one time. => prefer process-centric

jhc13 / taggui