Open JamesStallings opened 4 months ago
Hey! The codebase wasn't built to be geared toward nvidia in any way as I run everything on a Mac. The major python installs are mainly for the llama-cpp-agent framework but there could accidentally be some extra installs that aren't 100% needed that I forgot to get rid of during the process of versioning while crafting. It is built around the API of Ollama so technically it should be able to link into llamafiles as well, but you may need to set different base URLs or another method if it doesn't have the model switching that Ollama does. I haven't tested any other llm library or providers yet. I'll dig into this and see though I can't test anything nvidia related on my hardware unfortunately.
Hey man thanks for the quick response!
llamafiles are amazing, being a C language wrapper around the model using Cosmopolitan C, which gives the model the ability to be executed directly as a binary on pretty much every os on every architecture. As such, it shifts load off the gpu, onto CPU cores. This makes it quite portable and fairly well optimized. Both a self-contained web server/front-end ‘chat’ and a prompt solution endpoint are provided by this binary.
This makes it very easy to manage models and use multiple models in a given workflow, really easier even than ollama; and all without any framework really at all.
It’s great to know the mycomind project does not depend on gpu compute. You are doing some very interesting work and I am eager to have a go with it.
Cheers ॐ♲ ♥ ☸☮ ☯☰☶☯☮ ☸ ♥ ♲ॐ KI5SMN
On Wed, Jul 10, 2024 at 5:46 PM Beckett @.***> wrote:
Hey! The codebase wasn't built to be geared toward nvidia in any way as I run everything on a Mac. The major python installs are mainly for the llama-cpp-agent framework but there could accidentally be some extra installs that aren't 100% needed that I forgot to get rid of during the process of versioning while crafting. It is built around the API of Ollama so technically it should be able to link into llamafiles as well, but you may need to set different base URLs or another method if it doesn't have the model switching that Ollama does. I haven't tested any other llm library or providers yet. I'll dig into this and see though I can't test anything nvidia related on my hardware unfortunately.
— Reply to this email directly, view it on GitHub https://github.com/severian42/Mycomind-Daemon-Ollama-Mixture-of-Memory-RAG-Agents/issues/2#issuecomment-2221652268, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA3PVQJ3EZJX2YBRF22J23ZLW2TVAVCNFSM6AAAAABKVTCIMCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRRGY2TEMRWHA . You are receiving this because you authored the thread.Message ID: <severian42/Mycomind-Daemon-Ollama-Mixture-of-Memory-RAG-Agents/issues/2/2221652268 @github.com>
I apologize for not getting back to you sooner. As fate would have it, I stumbled on a RAG implementation tutorial using no frameworks with ollama as local model hosting/manager.
This has been successful to a great extent, at least enough to have me keeping my head down days at a stretch. Seems this game is very dependent on having the right model for your job, and being able to instruct it in a digestible fashion.
The codebase seems nvidia dependent, loading many such python libs during installation per requirements.txt.
Will the code not run via llamafile models? or on amd gpu compute?