heshengtao / comfyui_LLM_party

LLM Agent Framework in ComfyUI includes Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai/gemini interfaces, such as o1,ollama, grok, qwen, GLM, deepseek, moonshot,doubao. Adapted to local llms, vlm, gguf such as llama-3.2, Linkage neo4j KG, graphRAG / RAG / html 2 img
GNU Affero General Public License v3.0
1.05k stars 93 forks source link

Using the VLM-GGFU Model Type Provides an Error or No Output #91

Closed Iory1998 closed 1 month ago

Iory1998 commented 1 month ago

Describe the bug When I use the VLM-GGUF Loader node with the Local Large Language Model Node, if VLM-GGUF model type is selected, I get this error: "Prompt exceeds n_ctx: 2913 > 1024". This error occurs even after I restart comfyUI and I load the model for the first time. When I increase the max_ctn to a higher value, I get no output at all, "None". When I switch the Model type to LLM-GGUF, the model works properly and I get an output as expected. Please refer to the screenshot attached.

To Reproduce Steps to reproduce the behavior: Discussed above. Expected behavior A clear and concise description of what you expected to happen.

Screenshots LLM Party VLM Error

Check attached. Desktop (please complete the following information):

Smartphone (please complete the following information):

Additional context I am installing the latest llama.cpp with cuda 12.4 from this wheel: llama_cpp_python-0.2.90-cp310-cp310-win_amd64.whl

heshengtao commented 1 month ago

By increasing the max_ntx on the loader node, it is obvious that your image is too high definition, causing your input to exceed the maximum prompt word limit you set.

Iory1998 commented 1 month ago

By increasing the max_ntx on the loader node, it is obvious that your image is too high definition, causing your input to exceed the maximum prompt word limit you set.

I know and I explained that even when I increase the max-ntx to say 32K, the error goes away and I get no output. Also, I checked the size of the image, and no matter the size, I get the same error. I double checked!

heshengtao commented 1 month ago

Increase the max_ntx on the loader node, and then increase the max_length on the local LLM node, which controls the length of the conversation. If the max_length is only 512, the output must be none.

Iory1998 commented 1 month ago

Increase the max_ntx on the loader node, and then increase the max_length on the local LLM node, which controls the length of the conversation. If the max_length is only 512, the output must be none.

Dear @heshengtao As always, thank you for your prompt response. I am attaching a simple VLM-GGUF workflow to demonstrate the issue here. I am using this model llava-v1.6-mistral-7b.Q8_0, And below is a screenshot of the workflow. As you can see, I don't get any output. image image Am I doing something wrong?

heshengtao commented 1 month ago

You may have a garbage model, please use the llava-v1.6-mistral-7b model in this link: https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf/tree/main

image

Iory1998 commented 1 month ago

You may have a garbage model, please use the llava-v1.6-mistral-7b model in this link: https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf/tree/main

image

My friend, that's exactly the model I am using. Also, I want to draw your attention to 2 things:

heshengtao commented 1 month ago

I tried it, and the Q6 version works, but the Q8 version doesn't. I don't know why. The model repository doesn't label itself as adaptable to llama-cpp-python, so I think incompatibility is also possible. You can start an API with lm studio and transfer it to the party for use, which is also possible.

Iory1998 commented 1 month ago

Thank you for your time! Keep up the good work. Please add perplexica in the list of tools. Good luck