logancyang / obsidian-copilot

THE Copilot in Obsidian
GNU Affero General Public License v3.0
2.72k stars 189 forks source link

[Question] what machine should I use to run this with local LLMs? #439

Closed DaisukeMiyazaki closed 1 month ago

DaisukeMiyazaki commented 4 months ago

*short answer could be: get maxed out M3 Max 128GB RAM ...

This might be a silly question, but what would be the best/ most reasonable machine to run this plugin for using local LLMs? Especially after seeing local beta QA feature, I'd like to make a rough assumption on how much performance we can expect for each machine, as these machines aren't cheap.

I've been testing with my M1 macbook Air with 16GB RAM with LM Studio however, stream responses still take about 10s ~ with n_gpu_layers 24 when answering questions. Indexing with local embedding over 2000 files took a few minutes which seemed reasonable to me though.

Given the pace of seeing better models with 7B or bigger lately, possible quantizations, and also providers like Apple launching newer and newer machines, it's not very clear that which one fits for a long time usage.

logancyang commented 1 month ago

I can run Llama 3 70B on my macbook pro M3 max 96GB smoothly, a bit hot after a while though. However, I predict that we will have 13B or smaller models being close to gpt-4 grade at least for some domains before the end of 2024. So the local LLM being on-device is going main stream for sure IMO.