Problem
Jan is great, but I'm limited o the number of models I can run on my 16GB GPU. I saw there is a project called mixtral-offloading that could solve my problem.
I realize this isn't your fault, but if there were a way to integrate Jan with other offloading modules, that would be extremely helpful.
Success Criteria
The ability to run larger LLMs such as Mixtral 8x7B on a 16GB GPU.
Additional context
Pretty self-explanatory. If it can be done, great. If it's too much work, I just need to get a bigger GPU at some point. :)
I think we have no plan for this yet but would be great if it's there.
Maybe adding a new inference provider locally would help.
I will transfer this issue to nitro instead.
Problem Jan is great, but I'm limited o the number of models I can run on my 16GB GPU. I saw there is a project called mixtral-offloading that could solve my problem.
I realize this isn't your fault, but if there were a way to integrate Jan with other offloading modules, that would be extremely helpful.
Success Criteria The ability to run larger LLMs such as Mixtral 8x7B on a 16GB GPU.
Additional context Pretty self-explanatory. If it can be done, great. If it's too much work, I just need to get a bigger GPU at some point. :)