Open avitos opened 3 months ago
Hi, yea I need to update to the last llama.cpp to support current models. The issue is that llama.cpp has changed a lot (they update 3-5 times per day), and the underlying structure is now way different. The way I was able to compile these sources is no longer valid, sigh. I need to sort it all out so I can get it updated.
Got it, but I hope you can make the necessary changes. Because LMEngine is a great library :+1:
Got it, but I hope you can make the necessary changes. Because LMEngine is a great library 👍
Thanks. ✊🏿 If you need to use that model NOW, feel free to use GenAI
in Spark Game Toolkit. It represents the new method I will have to do for LMEngine
to be able to compile and get llama.cpp
working in Delphi going forward. Let me know how it works. Also try the gemma 2 2b it
model, its super-fast for me, I get around 50+ t/s
consistently. See the GenAI01
example.
Thanks, I'll give it a try! 👍 And I'll wait for LMEngine, because Llama-3-8b shows good results on analyzing documents with 16-32k token context. I'll try the same with Gemma now, but Llama-3.1 is awesome as I saw in the tests. I will also try phi3 and other models on my real documents.
Thanks a lot! LMEngine is a cool library! 👍
yea, it should support LLaMA 3.1, Phi 3.1, etc. Let me know if everything works, this is the testbed for how I will have to use llama.cpp with Delphi going forward. Not Phi 3.5 as of yet though.
Thanks a lot! LMEngine is a cool library! 👍
Thanks! ❤️
Everything works perfectly, including Llama 3.1.
I will use SGT for now, it doesn't need a dll-library (which is great).
SGT is a cool library too! 👍 I didn't know much about Gemma before.
I'll write below for those who may also want to start using SGT now.
Gemma is a good model by the way. Of course in case of documents it may not be so good, but it impresses with an average speed increase of four times compared to Llama 3/3.1! I think to use Gemma for text preprocessing, thus leaving for example only numerical values and their names, and then use Llama 3.1. It gives speed, and the quality of analysis is high.
@avitos here is early build of the new inference engine. I will put up it's on repo soon.
Does not start with the Llama 3.1 model. Is it possible to make changes to work with Llama 3.1? This is now the model with the most tokens and will potentially be used everywhere.