I've played with llama-2 models released by Meta just yet, and they're… pretty usable on my MacBook M1 Pro 2021 16Gb RAM (on behalf llama.cpp of course — Metal support for inference brings stunning boost to its inference performance in contrast to the CPU one).
Even at their minimum scale (e.g. 7B-chat and 13B-chat I've tried) they're performing well enough in both terms: prompting speed and quality, and since there's a chat (e.g. instruction based) behavior models provided next to ordinary one it leads to they now usable the same way that any OpenAI chatGPT model did from the user side.
So I'm looking forward to add their support to that plugin to use them locally. Let me elaborate my vision of that a bit.
Unfortunately highly likely the advantages of that enhancement would be available for Apple Silicon mac users only, since I'm targeting to myself I'll build this feature above the mentioned above llama.cpp library foundation that mostly outperforms on M1 MacBooks. I speculate that this library provides some sort of Nvidia GPU support, but haven't dug that yet, and to be honest have no further plans to do so by myself, so if anyone have enough effort for such and keen on porting this feature on either Windows or Linux — you're more than welcome for contribution.
I'll try to design settings (as it would be the only difference in terms of UX from a user side) of that feature in a straightforward, minimalistic and reusable way, but can't promise that I'll be succeeded with that in a short term, if any.
The very approximately and unreliable release date is about to the end 3Q 2023 at its best.
I've played with llama-2 models released by Meta just yet, and they're… pretty usable on my MacBook M1 Pro 2021 16Gb RAM (on behalf llama.cpp of course — Metal support for inference brings stunning boost to its inference performance in contrast to the CPU one).
Even at their minimum scale (e.g. 7B-chat and 13B-chat I've tried) they're performing well enough in both terms: prompting speed and quality, and since there's a chat (e.g. instruction based) behavior models provided next to ordinary one it leads to they now usable the same way that any OpenAI chatGPT model did from the user side.
So I'm looking forward to add their support to that plugin to use them locally. Let me elaborate my vision of that a bit.