idea: Add GPU offloading for larger/MOE models (e.g. mixtral-offloading)

Problem Jan is great, but I'm limited o the number of models I can run on my 16GB GPU. I saw there is a project called mixtral-offloading that could solve my problem.

I realize this isn't your fault, but if there were a way to integrate Jan with other offloading modules, that would be extremely helpful.

Success Criteria The ability to run larger LLMs such as Mixtral 8x7B on a 16GB GPU.

Additional context Pretty self-explanatory. If it can be done, great. If it's too much work, I just need to get a bigger GPU at some point. :)

janhq / cortex.cpp

idea: Add GPU offloading for larger/MOE models (e.g. mixtral-offloading) #462