Open geroldmeisinger opened 1 month ago
I would strongly argue for a history.jsonl
file in the image directory BY DEFAULT as it provides additional infos for any published image datasets on how the prompts came to be and which specific settings were used. if taggui catches on and more and more image datasets are published with this file, we get better insights. just like ComfyUI included workflows in every image.
users who don't like it can "opt-out" by just deleting the file anytime.
some proposals for discussion:
everytime you press start caption, a new entry is added with all the captioning settings, parameters, meta data and image selections:
[
{
date: datetime # 2023-12-31 12:34:56
# meta
model_id: str # THUDM/cogvlm2-llama3-chat-19B-int4
model_hash: str # e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
taggui_rev: str # 8d43352fa0eab65586108c806fdd80e8da5012c5
# captioning
prompt: str # "Can you please describe this image in up to two paragraphs?"
caption_start: str # "This image showcases"
...
# parameters
min_new_tokens: int
max_new_tokens: int
...
# selection
images: list[str] # [ "00000/000000001.jpg", "00000/000000005.jpg", ... ]
# or
images: tree[str] # [ "00000": [ "000000001.jpg", "000000005.jpg", ...], "00001": [...] ]
},
{
date: datetime # 2023-12-31 12:00:00
...
}
]
pros:
cons:
everytime the image selection changes, a new entry is added. every caption process just append nested in captionings
[
{
# selection
images: tree[str] # [ "00000": [ "000000001.jpg", "000000005.jpg", ...], "00001": [...] ]
# processes
captionings:
[
{
date: datetime # 2023-12-31 12:34:56
# meta
model_id: str # THUDM/cogvlm2-llama3-chat-19B-int4
model_hash: str # e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
taggui_rev: str # 8d43352fa0eab65586108c806fdd80e8da5012c5
# captioning
prompt: str # "Can you please describe this image in up to two paragraphs?"
caption_start: str # "This image showcases"
...
# parameters
min_new_tokens: int
max_new_tokens: int
...
},
{
date: datetime # 2023-12-31 12:00:00
...
}
]
},
{ # add different image selection
...
}
]
pros:
cons:
just save a .json
next to the captions .txt
and load last settings from first image selected
[
{
date: datetime # 2023-12-31 12:34:56
# meta
model_id: str # THUDM/cogvlm2-llama3-chat-19B-int4
model_hash: str # e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
taggui_rev: str # 8d43352fa0eab65586108c806fdd80e8da5012c5
# captioning
prompt: str # "Can you please describe this image in up to two paragraphs?"
caption_start: str # "This image showcases"
...
# parameters
min_new_tokens: int
max_new_tokens: int
...
},
{
date: datetime # 2023-12-31 12:00:00
...
}
]
pros:
cons:
a) first you make a small selection and try a lot of prompts. once you have the ideal prompt, you select all and caption all one time. => prefer process-centric
doesn't have to be per image, but at least a global history would be nice. and also saves settings.