psugihara / FreeChat

llama.cpp based AI chat app for macOS
https://www.freechat.run
MIT License
402 stars 34 forks source link

Try migrating from server architecture to llama.cpp.swift #42

Open psugihara opened 6 months ago

psugihara commented 6 months ago

Speed, stability, performance, simplicity! These are paramount concerns for freechat.

The current completion architecture using server.cpp works pretty well but has a few problems:

  1. model switching sometimes breaks
  2. model loading errors are not surfaced to the user, not captured
  3. it's kind of complicated and is not portable to iOS

We can fix 1 and 2 but not 3 with the current arch. As model sizes trend smaller, 3 is making more and more sense.

I did a quick audit of the newish SwiftUI example in llama.cpp and it's fantastic and fast. Let's try migrating FreeChat to doing inference in Swift in the same way.

We should try not to edit llama.cpp.swift so that it can be maintained in llama.cpp. Maybe there is some fancy git or SPM way to link it in, but copying the file is easy to start.