undreamai / LLMUnity

Create characters in Unity with LLMs!
https://undream.ai
MIT License
672 stars 69 forks source link

llama.cpp integration with DLL #141

Closed amakropoulos closed 4 months ago

amakropoulos commented 7 months ago

Describe the feature

LLM for Unity uses the llamafile server for the LLM functionality. This approach can't be used for mobile integrations due to security limitations. Purpose of this feature is to instead integrate llama.cpp directly as DLL. This feature doesn't necessarily mean to replace llamafile for PCs as this will need quite some testing and optimisation on different OS+CPUs

otdavies commented 6 months ago

How is this task going?

amakropoulos commented 6 months ago

It's progressing well, it should be done in the next 1-2 weeks

amakropoulos commented 6 months ago

is there something blocking you at the moment?

otdavies commented 6 months ago

I was just curious and prefer integrated dlls when possible. The tooling is impressive as is, I tested it with llama 3 this evening. I will keep an eye on this task. Let me know if you need any assistance testing etc.

amakropoulos commented 6 months ago

I would love some testing, thank you! I'll send you when I'm at that stage.

SubatomicPlanets commented 5 months ago

Is there any other difference this will make except adding Android support? What about performance and file size?

amakropoulos commented 5 months ago

sorry for the delayed reply, the main advantages of the DLL-based implementation are:

I think in terms of speed they should be similar, but still needs testing. The filesize for PC deployments will increase if CUDA is wanted which boosts the performance hugely, I'll provide this as an option though in case someone needs to only use CPU.

amakropoulos commented 4 months ago

After almost 2 months of development the DLL functionality finally entered the beta phase 🚀 !!

The feature is implemented in the release/v2.0.0 branch. Since the backend is entirely rewritten, I would love some testing before merging.

You could do the following 🤗:

otdavies commented 4 months ago

Fantastic work!! Very exciting, I'll carve out sometime to play with this in the coming days.

My use case would likely include Cuda, and then choose on the fly between GPU and CPU. So some level automatic detection would probably be a good idea in the long run.

Exciting!!

tempstudio commented 4 months ago

Is AMD GPU still supported for the DLL switch? It works with llamafile.

amakropoulos commented 4 months ago

@tempstudio which GPU do you have? Does llamafile use HIP for you? You can find it in the debug messages

I'm trying to make it work but I don't have access to an AMD GPU and can't test it :disappointed: .

llama.cpp provides 3 options for AMD:

Unless I find someone to test the Vulkan option I'm not sure if I can include support.

tempstudio commented 4 months ago

I had to go back to llamafile 0.8.4 to use the "precompiled Windows DLL" that worked with AMD. I think this is using HIP. ROCM on Windows supports most of AMD's graphics card in the current and prior generation: https://rocm.docs.amd.com/projects/install-on-windows/en/docs-6.0.2/reference/system-requirements.html

amakropoulos commented 4 months ago

I managed to build and add HIP support for AMD, as well as Vulkan :tada: ! It is incorporated at the new release (v2.0.1). I can't test either on AMD GPU since I don't have one, but because the build is based on a llama.cpp workflow I expect it to work. @tempstudio please let me know if it works!