Try migrating from server architecture to llama.cpp.swift

Speed, stability, performance, simplicity! These are paramount concerns for freechat.

The current completion architecture using server.cpp works pretty well but has a few problems:

model switching sometimes breaks
model loading errors are not surfaced to the user, not captured
it's kind of complicated and is not portable to iOS

We can fix 1 and 2 but not 3 with the current arch. As model sizes trend smaller, 3 is making more and more sense.

I did a quick audit of the newish SwiftUI example in llama.cpp and it's fantastic and fast. Let's try migrating FreeChat to doing inference in Swift in the same way.

We should try not to edit llama.cpp.swift so that it can be maintained in llama.cpp. Maybe there is some fancy git or SPM way to link it in, but copying the file is easy to start.

psugihara / FreeChat

Try migrating from server architecture to llama.cpp.swift #42