long8v / PTIR

Paper Today I Read
19 stars 0 forks source link

[107] Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models #116

Open long8v opened 1 year ago

long8v commented 1 year ago
image

paper, github, demo

TL;DR

Details

image

hf에서 visual foundation models + MaskFormer Since Visual ChatGPT is a text language model, Visual ChatGPT must use tools to observe images rather than imagination. The thoughts and observations are only visible for Visual ChatGPT, Visual ChatGPT should remember to repeat important information in the final response for Human. Thought: Do I need to use a tool?”를 prefix로 두고 쿼리를 날렸다고 함.

image image image image image