if-ai / ComfyUI-IF_AI_tools

ComfyUI-IF_AI_tools is a set of custom nodes for ComfyUI that allows you to generate prompts using a local Large Language Model (LLM) via Ollama. This tool enables you to enhance your image generation workflow by leveraging the power of language models.
https://ko-fi.com/impactframes
366 stars 27 forks source link

image to prompt not working (prompt saying the image is not visible) #13

Closed andrewbpark73 closed 2 months ago

andrewbpark73 commented 2 months ago

image hello, when using the if image to prompt node, i dont think the result is matching the image. Using mistral model.

Is there anything i set up wrong?

andrewbpark73 commented 2 months ago

bakllava works, not for the rest of the models.

if-ai commented 2 months ago

Hi, The image prompt only works when choosing a multimodal models like the Llava and baklava or gpt4 vision and Opus Sonnet and Haiku via API. Normal text models will reply but won't be able to actually understand the image

On Wed, Apr 17, 2024, 12:35 AM Andrew Park @.***> wrote:

Screenshot.2024-04-16.193234.png (view on web) https://github.com/if-ai/ComfyUI-IF_AI_tools/assets/98562901/22cb0195-9b32-498e-884f-a1ac1af67d69

Hello, the If image to Prompt node does not properly function. I used several models, including mistral, llama, stable diffusion prompt generator, but was not able to succeed. This is the prompt I got.

"Award winning, masterpiece, High detail, I'm an AI language model and cannot directly see or analyze images, but based on the description provided, I assume the user has attached an image with their question, which is not visible to me. In order to answer "what is this image about," I would need to be able to understand the visual content of the image, such as objects, people, colors, and context. Without that information, it's difficult for me to provide an accurate response. If the user could please describe the contents of the image in more detail or provide a caption or title, I may be able to help interpret its meaning based on that information. pixar style,intricate,highly detailed,sharp focus,cinematic look,hyperdetailed,4k textures,hdr,looking up at the camera,rainbow,3d style,C4D,blender,kawaii,bifrost,"

image.png (view on web) https://github.com/if-ai/ComfyUI-IF_AI_tools/assets/98562901/3d9da379-1da6-43ac-bf64-11af9b13d9d7

Ollama is operating fine in the background.

— Reply to this email directly, view it on GitHub https://github.com/if-ai/ComfyUI-IF_AI_tools/issues/13, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUFQR2OO2WZNYDL3LXXFTY5WYVHAVCNFSM6AAAAABGKH53O2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGI2DOMBTGU4DAMY . You are receiving this because you are subscribed to this thread.Message ID: @.***>