janhq / cortex.cpp

Local AI API Platform
https://cortex.so
Apache License 2.0
2.16k stars 130 forks source link

idea: Add GPU offloading for larger/MOE models (e.g. mixtral-offloading) #462

Open poldon opened 10 months ago

poldon commented 10 months ago

Problem Jan is great, but I'm limited o the number of models I can run on my 16GB GPU. I saw there is a project called mixtral-offloading that could solve my problem.

I realize this isn't your fault, but if there were a way to integrate Jan with other offloading modules, that would be extremely helpful.

Success Criteria The ability to run larger LLMs such as Mixtral 8x7B on a 16GB GPU.

Additional context Pretty self-explanatory. If it can be done, great. If it's too much work, I just need to get a bigger GPU at some point. :)

hiro-v commented 9 months ago

I think we have no plan for this yet but would be great if it's there. Maybe adding a new inference provider locally would help. I will transfer this issue to nitro instead.