vikhyat / moondream

tiny vision language model
https://moondream.ai
Apache License 2.0
4.9k stars 436 forks source link

Bulk prompting #27

Open Joly0 opened 7 months ago

Joly0 commented 7 months ago

Hey, it looks like this could be used for captioning images, so i thought if you could maybe add a function to caption images or adjust the prompting so it can be done in bulk and the output can be saved to files?

fblissjr commented 7 months ago

I'm sure someone will get to it before I have a chance, but I did this for cogvlm (probably a ton of others have as well). Not sure it helps but just in case - https://github.com/fblissjr/cogvlm-image-caption

duracell80 commented 7 months ago

To save the output in exif in a JPEG

In sample.py add at the bottom near print(answer):

with open(f"{args.image}.txt", "w") as desc_file:
    desc_file.write(answer)

In a bash script with exiftool installed and in the same directory as the image and text file are located:

/home/user/.local/bin/caption-image.sh </path/to/image.jpg>

RESPONSE=$(cat "${1}.txt")

exiftool -overwrite_original -Exif:ImageDescription="${RESPONSE}" -Exif:XPComment="${RESPONSE}" -Description="${RESPONSE}" "${1}"

This then produces a JPEG that can be searched in the Nemo file manager with the "Search Content" input field, use "*.jpg" in the "Search for Files" input field. I'm sure it could be done directly in sample.py but this seemed useful for other scripts. You could even archive the text files into a .comments folder where the images are located.