Closed Iory1998 closed 1 month ago
By increasing the max_ntx on the loader node, it is obvious that your image is too high definition, causing your input to exceed the maximum prompt word limit you set.
By increasing the max_ntx on the loader node, it is obvious that your image is too high definition, causing your input to exceed the maximum prompt word limit you set.
I know and I explained that even when I increase the max-ntx to say 32K, the error goes away and I get no output. Also, I checked the size of the image, and no matter the size, I get the same error. I double checked!
Increase the max_ntx on the loader node, and then increase the max_length on the local LLM node, which controls the length of the conversation. If the max_length is only 512, the output must be none.
Increase the max_ntx on the loader node, and then increase the max_length on the local LLM node, which controls the length of the conversation. If the max_length is only 512, the output must be none.
Dear @heshengtao As always, thank you for your prompt response. I am attaching a simple VLM-GGUF workflow to demonstrate the issue here. I am using this model llava-v1.6-mistral-7b.Q8_0, And below is a screenshot of the workflow. As you can see, I don't get any output. Am I doing something wrong?
You may have a garbage model, please use the llava-v1.6-mistral-7b model in this link: https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf/tree/main
You may have a garbage model, please use the llava-v1.6-mistral-7b model in this link: https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf/tree/main
My friend, that's exactly the model I am using. Also, I want to draw your attention to 2 things:
WARNING: LLM_local.original_IS_CHANGED() got an unexpected keyword argument 'system_prompt' Prompt exceeds n_ctx: 8904 > 8192 Prompt executed in 7.33 seconds That happens because of the Chat history. If I disable the chat history, no output occurs unless I activate the chat history even if I choose 1 round. Is there a way to deactivate the chat?
I tried it, and the Q6 version works, but the Q8 version doesn't. I don't know why. The model repository doesn't label itself as adaptable to llama-cpp-python, so I think incompatibility is also possible. You can start an API with lm studio and transfer it to the party for use, which is also possible.
Thank you for your time! Keep up the good work. Please add perplexica in the list of tools. Good luck
Describe the bug When I use the VLM-GGUF Loader node with the Local Large Language Model Node, if VLM-GGUF model type is selected, I get this error: "Prompt exceeds n_ctx: 2913 > 1024". This error occurs even after I restart comfyUI and I load the model for the first time. When I increase the max_ctn to a higher value, I get no output at all, "None". When I switch the Model type to LLM-GGUF, the model works properly and I get an output as expected. Please refer to the screenshot attached.
To Reproduce Steps to reproduce the behavior: Discussed above. Expected behavior A clear and concise description of what you expected to happen.
Screenshots
Check attached. Desktop (please complete the following information):
Smartphone (please complete the following information):
Additional context I am installing the latest llama.cpp with cuda 12.4 from this wheel: llama_cpp_python-0.2.90-cp310-cp310-win_amd64.whl