Using the VLM-GGFU Model Type Provides an Error or No Output

Iory1998 commented 1 month ago

Describe the bug When I use the VLM-GGUF Loader node with the Local Large Language Model Node, if VLM-GGUF model type is selected, I get this error: "Prompt exceeds n_ctx: 2913 > 1024". This error occurs even after I restart comfyUI and I load the model for the first time. When I increase the max_ctn to a higher value, I get no output at all, "None". When I switch the Model type to LLM-GGUF, the model works properly and I get an output as expected. Please refer to the screenshot attached.

To Reproduce Steps to reproduce the behavior: Discussed above. Expected behavior A clear and concise description of what you expected to happen.

Screenshots LLM Party VLM Error

Check attached. Desktop (please complete the following information):

OS: Windows 11 Pro
Browser : Vivaldi
Version [e.g. 22]

Smartphone (please complete the following information):

Device: [e.g. iPhone6]
OS: [e.g. iOS8.1]
Browser [e.g. stock browser, safari]
Version [e.g. 22]

Additional context I am installing the latest llama.cpp with cuda 12.4 from this wheel: llama_cpp_python-0.2.90-cp310-cp310-win_amd64.whl

heshengtao commented 1 month ago

By increasing the max_ntx on the loader node, it is obvious that your image is too high definition, causing your input to exceed the maximum prompt word limit you set.

Iory1998 commented 1 month ago

By increasing the max_ntx on the loader node, it is obvious that your image is too high definition, causing your input to exceed the maximum prompt word limit you set.

I know and I explained that even when I increase the max-ntx to say 32K, the error goes away and I get no output. Also, I checked the size of the image, and no matter the size, I get the same error. I double checked!

heshengtao commented 1 month ago

Increase the max_ntx on the loader node, and then increase the max_length on the local LLM node, which controls the length of the conversation. If the max_length is only 512, the output must be none.

Iory1998 commented 1 month ago

Increase the max_ntx on the loader node, and then increase the max_length on the local LLM node, which controls the length of the conversation. If the max_length is only 512, the output must be none.

Dear @heshengtao As always, thank you for your prompt response. I am attaching a simple VLM-GGUF workflow to demonstrate the issue here. I am using this model llava-v1.6-mistral-7b.Q8_0, And below is a screenshot of the workflow. As you can see, I don't get any output. Am I doing something wrong?

heshengtao commented 1 month ago

You may have a garbage model, please use the llava-v1.6-mistral-7b model in this link: https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf/tree/main

Iory1998 commented 1 month ago

You may have a garbage model, please use the llava-v1.6-mistral-7b model in this link: https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf/tree/main

My friend, that's exactly the model I am using. Also, I want to draw your attention to 2 things:

The models work fine for me in LM Studio, so I know that they are working fine. See Screenshot below.
The below error:

WARNING: LLM_local.original_IS_CHANGED() got an unexpected keyword argument 'system_prompt' Prompt exceeds n_ctx: 8904 > 8192 Prompt executed in 7.33 seconds That happens because of the Chat history. If I disable the chat history, no output occurs unless I activate the chat history even if I choose 1 round. Is there a way to deactivate the chat?

heshengtao commented 1 month ago

I tried it, and the Q6 version works, but the Q8 version doesn't. I don't know why. The model repository doesn't label itself as adaptable to llama-cpp-python, so I think incompatibility is also possible. You can start an API with lm studio and transfer it to the party for use, which is also possible.

Iory1998 commented 1 month ago

Thank you for your time! Keep up the good work. Please add perplexica in the list of tools. Good luck

heshengtao / comfyui_LLM_party

Using the VLM-GGFU Model Type Provides an Error or No Output #91