Closed Vineshg closed 3 months ago
LLM party essentially uses ollama's openai interface to use ollama, so other parameters of ollama's native interface cannot be used. If you want to switch models, just change the model name to let ollama automatically uninstall the previous model and load the new model. In the future, LLM party may be better compatible with ollama-related parameters.
Thanks for quick response. I have 12 GB VRAM GPU. Once LLM prompt is generated I want to unload LLM model before starting to load image generation model (Flux) to save VRAM. It will be great to implement this option in LLM_party node as a lot of people have limited VRAM.
Connect the workflow as shown in my image. Use the Clear Model node in any of the connections, and enable the is_ollama parameter on the Clear Model node. You will find that after using Ollama for conversation, the model is automatically unloaded from the VRAM.
dalao,请问这个项目是不是也可以链接koboldcpp,像ollama一样使用?
dalao,请问这个项目是不是也可以链接koboldcpp,像ollama一样使用?
可以
Ollama allows to unload the model immediately after the query using keep_live parameter as mentioned in their docs https://github.com/ollama/ollama/blob/main/docs/faq.md
curl http://localhost:11434/api/generate -d '{"model": "llama3", "keep_alive": 0}'
There are other parameters as well to control the model in memory. How do I use these parameters in API large language model node? I tried using extra parameters option but i got the error
Completions.create() got an unexpected keyword argument 'keep_alive'