tinyBigGAMES / LMEngine

Local LLM Inference
BSD 3-Clause "New" or "Revised" License
23 stars 4 forks source link

Doesn't work with Llama 3.1 #8

Open avitos opened 3 months ago

avitos commented 3 months ago

Does not start with the Llama 3.1 model. Is it possible to make changes to work with Llama 3.1? This is now the model with the most tokens and will potentially be used everywhere.

jarroddavis68 commented 3 months ago

Hi, yea I need to update to the last llama.cpp to support current models. The issue is that llama.cpp has changed a lot (they update 3-5 times per day), and the underlying structure is now way different. The way I was able to compile these sources is no longer valid, sigh. I need to sort it all out so I can get it updated.

avitos commented 3 months ago

Got it, but I hope you can make the necessary changes. Because LMEngine is a great library :+1:

jarroddavis68 commented 3 months ago

Got it, but I hope you can make the necessary changes. Because LMEngine is a great library 👍

Thanks. ✊🏿 If you need to use that model NOW, feel free to use GenAI in Spark Game Toolkit. It represents the new method I will have to do for LMEngine to be able to compile and get llama.cpp working in Delphi going forward. Let me know how it works. Also try the gemma 2 2b it model, its super-fast for me, I get around 50+ t/s consistently. See the GenAI01 example.

avitos commented 3 months ago

Thanks, I'll give it a try! 👍 And I'll wait for LMEngine, because Llama-3-8b shows good results on analyzing documents with 16-32k token context. I'll try the same with Gemma now, but Llama-3.1 is awesome as I saw in the tests. I will also try phi3 and other models on my real documents.

Thanks a lot! LMEngine is a cool library! 👍

jarroddavis68 commented 3 months ago

yea, it should support LLaMA 3.1, Phi 3.1, etc. Let me know if everything works, this is the testbed for how I will have to use llama.cpp with Delphi going forward. Not Phi 3.5 as of yet though.

Thanks a lot! LMEngine is a cool library! 👍

Thanks! ❤️

avitos commented 2 months ago

Everything works perfectly, including Llama 3.1.

I will use SGT for now, it doesn't need a dll-library (which is great).

SGT is a cool library too! 👍 I didn't know much about Gemma before.

I'll write below for those who may also want to start using SGT now.

Gemma is a good model by the way. Of course in case of documents it may not be so good, but it impresses with an average speed increase of four times compared to Llama 3/3.1! I think to use Gemma for text preprocessing, thus leaving for example only numerical values and their names, and then use Llama 3.1. It gives speed, and the quality of analysis is high.

jarroddavis68 commented 1 month ago

@avitos https://github.com/tinyBigGAMES/LMEngine/discussions/7#discussioncomment-10976532

jarroddavis68 commented 1 month ago

@avitos here is early build of the new inference engine. I will put up it's on repo soon.

Lumina_v0.1.0.zip