llama.cpp integration with DLL

amakropoulos commented 7 months ago

Describe the feature

LLM for Unity uses the llamafile server for the LLM functionality. This approach can't be used for mobile integrations due to security limitations. Purpose of this feature is to instead integrate llama.cpp directly as DLL. This feature doesn't necessarily mean to replace llamafile for PCs as this will need quite some testing and optimisation on different OS+CPUs

otdavies commented 6 months ago

How is this task going?

amakropoulos commented 6 months ago

It's progressing well, it should be done in the next 1-2 weeks

amakropoulos commented 6 months ago

is there something blocking you at the moment?

otdavies commented 6 months ago

I was just curious and prefer integrated dlls when possible. The tooling is impressive as is, I tested it with llama 3 this evening. I will keep an eye on this task. Let me know if you need any assistance testing etc.

amakropoulos commented 6 months ago

I would love some testing, thank you! I'll send you when I'm at that stage.

SubatomicPlanets commented 5 months ago

Is there any other difference this will make except adding Android support? What about performance and file size?

amakropoulos commented 5 months ago

sorry for the delayed reply, the main advantages of the DLL-based implementation are:

it will allow to develop Android and iOS deployment. On mobile platforms it's not possible to create a server-client infrastructure as used with llamafile
it will prevent issues with llamafile and some antivirus systems Some antivirus complain with llamafile, most probably because when llamafile runs for the first time it builds llama.cpp
it will be easier to add the latest developments of llama.cpp down the line e.g. supporting latest models. Right now I have to wait for llamafile to pull the upstream llama.cpp

I think in terms of speed they should be similar, but still needs testing. The filesize for PC deployments will increase if CUDA is wanted which boosts the performance hugely, I'll provide this as an option though in case someone needs to only use CPU.

amakropoulos commented 4 months ago

After almost 2 months of development the DLL functionality finally entered the beta phase 🚀 !!

The feature is implemented in the release/v2.0.0 branch. Since the backend is entirely rewritten, I would love some testing before merging.

You could do the following 🤗:

checkout the release/v2.0.0 branch or download & unzip it directly from here: https://github.com/undreamai/LLMUnity/archive/refs/heads/release/v2.0.0.zip
create a new Unity project
Import the package from disk: [Window > Package Manager > Add package from disk] Select the package.json inside the checked out/downloaded LLMUnity folder
Run one or more of the samples
Report any issues or feature requests here or on the #dll channel on Discord 🐛.

otdavies commented 4 months ago

Fantastic work!! Very exciting, I'll carve out sometime to play with this in the coming days.

My use case would likely include Cuda, and then choose on the fly between GPU and CPU. So some level automatic detection would probably be a good idea in the long run.

Exciting!!

tempstudio commented 4 months ago

Is AMD GPU still supported for the DLL switch? It works with llamafile.

amakropoulos commented 4 months ago

@tempstudio which GPU do you have? Does llamafile use HIP for you? You can find it in the debug messages

I'm trying to make it work but I don't have access to an AMD GPU and can't test it :disappointed: .

llama.cpp provides 3 options for AMD:

HIP: this is supposed to be the best option but only supports a few high-end GPUs that not many people will have. Also needs a AMD GPU to build, so I can't do that.
OpenCL: not an option anymore, support is removed as of 2 weeks ago
Vulkan: I can build and deploy with it. For my Nvidia GPUs performance is same or lower to CPU

Unless I find someone to test the Vulkan option I'm not sure if I can include support.

tempstudio commented 4 months ago

I had to go back to llamafile 0.8.4 to use the "precompiled Windows DLL" that worked with AMD. I think this is using HIP. ROCM on Windows supports most of AMD's graphics card in the current and prior generation: https://rocm.docs.amd.com/projects/install-on-windows/en/docs-6.0.2/reference/system-requirements.html

amakropoulos commented 4 months ago

I managed to build and add HIP support for AMD, as well as Vulkan :tada: ! It is incorporated at the new release (v2.0.1). I can't test either on AMD GPU since I don't have one, but because the build is based on a llama.cpp workflow I expect it to work. @tempstudio please let me know if it works!

undreamai / LLMUnity

llama.cpp integration with DLL #141

Describe the feature