Unable to unload ollama model after a query using keep_alive option.

heshengtao / comfyui_LLM_party

LLM Agent Framework in ComfyUI includes Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai/gemini interfaces, such as o1,ollama, grok, qwen, GLM, deepseek, moonshot,doubao. Adapted to local llms, vlm, gguf such as llama-3.2, Linkage neo4j KG, graphRAG / RAG / html 2 img

GNU Affero General Public License v3.0

1.06k stars 94 forks source link

Unable to unload ollama model after a query using keep_alive option. #71

Closed Vineshg closed 3 months ago

Vineshg commented 3 months ago

Ollama allows to unload the model immediately after the query using keep_live parameter as mentioned in their docs https://github.com/ollama/ollama/blob/main/docs/faq.md

curl http://localhost:11434/api/generate -d '{"model": "llama3", "keep_alive": 0}'

There are other parameters as well to control the model in memory. How do I use these parameters in API large language model node? I tried using extra parameters option but i got the error Completions.create() got an unexpected keyword argument 'keep_alive'

heshengtao commented 3 months ago

LLM party essentially uses ollama's openai interface to use ollama, so other parameters of ollama's native interface cannot be used. If you want to switch models, just change the model name to let ollama automatically uninstall the previous model and load the new model. In the future, LLM party may be better compatible with ollama-related parameters.

Vineshg commented 3 months ago

Thanks for quick response. I have 12 GB VRAM GPU. Once LLM prompt is generated I want to unload LLM model before starting to load image generation model (Flux) to save VRAM. It will be great to implement this option in LLM_party node as a lot of people have limited VRAM.

heshengtao commented 3 months ago

屏幕截图 2024-08-23 160935 Connect the workflow as shown in my image. Use the Clear Model node in any of the connections, and enable the is_ollama parameter on the Clear Model node. You will find that after using Ollama for conversation, the model is automatically unloaded from the VRAM.

Unityisalreadybetaken commented 3 months ago

dalao，请问这个项目是不是也可以链接koboldcpp，像ollama一样使用？

heshengtao commented 3 months ago

dalao，请问这个项目是不是也可以链接koboldcpp，像ollama一样使用？

可以