ttengwang / Caption-Anything

Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything
BSD 3-Clause "New" or "Revised" License
1.66k stars 104 forks source link

About interaction #9

Open PilgrimMay opened 1 year ago

PilgrimMay commented 1 year ago

Could this work achieve caption everything without any interaction like SAM?

ttengwang commented 1 year ago

@PilgrimMay Thank you for your suggestion. Currently, the repository does not support captioning "everything" in a dense caption format. However, we will be adding this feature within the next few days.

PilgrimMay commented 1 year ago

Thanks for your continuous upgrading. Note that this job now seems to support caption everything. Is it possible to try this function in the demo?

ttengwang commented 1 year ago

yes, try demo with chatGPT

PilgrimMay commented 1 year ago

OK! Thanks for your excellent work. Would you release the code about training? So that I could train my own datasets.

ttengwang commented 1 year ago

Hi, the model combines pretrained models like SAM, ChatGPT, and BLIP-2 for interactive usage. No additional training is needed. Please refer to the paper at https://arxiv.org/pdf/2305.02677.pdf for more details, and the Acknowledgement Section in the Readme for the official training code of each pretrained model .