Open Pdonor opened 1 month ago
add to environment variable OLLAMA_KEEP_ALIVE 0
okie, let me try try unload; is these actions that u want:
( i use 4060ti 16g vram, so i usually load 7B LLM with SDXL is fine for me.)
BTW, if active web-ui [generate forever]. u can consider use another one extension whic can send ur fantastic LLM sd-result to IM app. review like comic book on ur mobile phone. its fun. https://github.com/xlinx/sd-webui-decadetw-auto-messaging-realtime
Hi! With the new version of Forge, and FLUX, this extension could be really practical for the millions of low VRAM laptops that can now run FLUX. The only problem is that it doesn't unload the LLM from VRAM when using Ollama, so the generation is way too slow.
According to https://github.com/ollama/ollama/issues/1600 , that can be accomplished with ''' curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": 0}' ''' Can that be put in your code?
Also, could it be set to store a different system prompt and ollama settings? I found that giving it an example in the system prompt works well.
Basically, it seems you are a few lines of code away from the best 'magic prompt' software in the world, surpassing the ones on Dalle-3 and Ideogram, which are censored. Thank you!