Closed amakropoulos closed 4 months ago
How is this task going?
It's progressing well, it should be done in the next 1-2 weeks
is there something blocking you at the moment?
I was just curious and prefer integrated dlls when possible. The tooling is impressive as is, I tested it with llama 3 this evening. I will keep an eye on this task. Let me know if you need any assistance testing etc.
I would love some testing, thank you! I'll send you when I'm at that stage.
Is there any other difference this will make except adding Android support? What about performance and file size?
sorry for the delayed reply, the main advantages of the DLL-based implementation are:
it will allow to develop Android and iOS deployment. On mobile platforms it's not possible to create a server-client infrastructure as used with llamafile
it will prevent issues with llamafile and some antivirus systems Some antivirus complain with llamafile, most probably because when llamafile runs for the first time it builds llama.cpp
it will be easier to add the latest developments of llama.cpp down the line e.g. supporting latest models. Right now I have to wait for llamafile to pull the upstream llama.cpp
I think in terms of speed they should be similar, but still needs testing. The filesize for PC deployments will increase if CUDA is wanted which boosts the performance hugely, I'll provide this as an option though in case someone needs to only use CPU.
After almost 2 months of development the DLL functionality finally entered the beta phase 🚀 !!
The feature is implemented in the release/v2.0.0 branch. Since the backend is entirely rewritten, I would love some testing before merging.
You could do the following 🤗:
Fantastic work!! Very exciting, I'll carve out sometime to play with this in the coming days.
My use case would likely include Cuda, and then choose on the fly between GPU and CPU. So some level automatic detection would probably be a good idea in the long run.
Exciting!!
Is AMD GPU still supported for the DLL switch? It works with llamafile.
@tempstudio which GPU do you have? Does llamafile use HIP for you? You can find it in the debug messages
I'm trying to make it work but I don't have access to an AMD GPU and can't test it :disappointed: .
llama.cpp provides 3 options for AMD:
Unless I find someone to test the Vulkan option I'm not sure if I can include support.
I had to go back to llamafile 0.8.4 to use the "precompiled Windows DLL" that worked with AMD. I think this is using HIP. ROCM on Windows supports most of AMD's graphics card in the current and prior generation: https://rocm.docs.amd.com/projects/install-on-windows/en/docs-6.0.2/reference/system-requirements.html
I managed to build and add HIP support for AMD, as well as Vulkan :tada: ! It is incorporated at the new release (v2.0.1). I can't test either on AMD GPU since I don't have one, but because the build is based on a llama.cpp workflow I expect it to work. @tempstudio please let me know if it works!
Describe the feature
LLM for Unity uses the llamafile server for the LLM functionality. This approach can't be used for mobile integrations due to security limitations. Purpose of this feature is to instead integrate llama.cpp directly as DLL. This feature doesn't necessarily mean to replace llamafile for PCs as this will need quite some testing and optimisation on different OS+CPUs