Is your feature request related to a problem? Please describe.
I think it would be super neat if you could use LLMs to predict text, they are basically auto complete on steroids and take everything you typed before into account. Even a smaller LLM like Phi 2 would be great, which would be likely what I would use because it's very light and runs stupidly fast even on CPU and absurdly fast on GPU. I know AI is controversial, but I think auto complete is a great use case for it
Describe the solution you'd like
A option to point it to a ollama daemon running on your computer and what model to use for text prediction, it should ideally take into account the last few worlds or tokens you have written. To my understanding LLMs technically give you multiple likely tokens with different weights, but usually it will just select the most likely of course but this could be used to add multiple options to select
Describe alternatives you've considered
There is also OpenAI but that isn't locally on your computer and would cost a lot for the user, also you could probably use llama cpp instead which has Python bindings but this might make GPU acceleration harder. I would just go with ollama since you can get it to work with both AMD ROCm and NVIDIA CUDA, also it has a deamon and a nice API it's just all around nice
Is your feature request related to a problem? Please describe. I think it would be super neat if you could use LLMs to predict text, they are basically auto complete on steroids and take everything you typed before into account. Even a smaller LLM like Phi 2 would be great, which would be likely what I would use because it's very light and runs stupidly fast even on CPU and absurdly fast on GPU. I know AI is controversial, but I think auto complete is a great use case for it
Describe the solution you'd like A option to point it to a ollama daemon running on your computer and what model to use for text prediction, it should ideally take into account the last few worlds or tokens you have written. To my understanding LLMs technically give you multiple likely tokens with different weights, but usually it will just select the most likely of course but this could be used to add multiple options to select
Describe alternatives you've considered There is also OpenAI but that isn't locally on your computer and would cost a lot for the user, also you could probably use llama cpp instead which has Python bindings but this might make GPU acceleration harder. I would just go with ollama since you can get it to work with both AMD ROCm and NVIDIA CUDA, also it has a deamon and a nice API it's just all around nice
Additional context Ollama Github