ttengwang / Caption-Anything

Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything
BSD 3-Clause "New" or "Revised" License
1.66k stars 103 forks source link

请问能输入一张图片,然后直接输出图片中的内容和相关性吗(也就是不要点击),我想用来给图片生成描述方便搜索 #13

Open wacdev opened 1 year ago

wacdev commented 1 year ago

请问能输入一张图片,然后直接输出图片中的内容和相关性吗(也就是不要点击),我想用来给图片生成描述方便搜索

ttengwang commented 1 year ago

@wacdev Thanks for the question. We have added the "Caption everything in a paragraph" feature.

wanghaisheng commented 1 year ago

@ttengwang does this "Caption everything in a paragraph" feature rely on openai chatgpt? can we use bing gpt instead

ttengwang commented 1 year ago

Yes, a chatGPT-like LLM is required for paragraph generation. It is ok to replace it with another gpt, as long as there is an API available to facilitate the integration.

wanghaisheng commented 1 year ago

@ttengwang another question without click to drive the prompt, what input gpt consume? can you add explanation to existing click driven image like this https://github.com/ttengwang/Caption-Anything/blob/main/assets/demo1.png at last I just want to thank you for your work, this definitely give me confidence and a great start to catch on. during last 2 years I have dig about Audio description service which I want to integrate a affordable wearable camera to aid visual impairment people in their daily life.

Audio description (also referred to as “description” or “video description”) is defined as “the verbal depiction of key visual elements in media and live productions.” AD is meant to provide information on visual content that is considered essential to the comprehension of the program

ttengwang commented 1 year ago

Thank you so much for your kind words and encouragement. It truly means a lot to our team. You can check out our technical report for more details https://arxiv.org/pdf/2305.02677.pdf

The description of "paragraph generation" is at the bottom of page 5.

image

wanghaisheng commented 1 year ago

gotta @ttengwang