kherud / java-llama.cpp

Java Bindings for llama.cpp - A Port of Facebook's LLaMA model in C/C++
MIT License
279 stars 28 forks source link

[Feature Request] Bundle compiled CPU-only libs into jar #2

Closed AutonomicPerfectionist closed 12 months ago

AutonomicPerfectionist commented 1 year ago

Is it possible to use Github Actions to compile the llama.cpp libraries for CPU-only inference (to avoid needing multiple different variants per architecture and to avoid build dep issues in Actions) and include them in the built jar? This project has a workflow that can be used to build off of. They use JNI instead of JNA, but the same methods should be usable. The resource folders containing the architecture-specific libraries need to be named a certain way though for JNA to find them: {os-name}-{arch}. The docs aren't super helpful on that, they imply on linux it should be linux-amd64 but I could only get it to work by naming the resource folder linux-x86-64

I can help with this if you would like, but I can't test the workflow myself for obvious reasons

kherud commented 1 year ago

I have used a similar approach in the past using JNI (inspired by sqlite-jdbc back then), but decided against it for this project, since there are just too many ways to compile the library.

I am no C/C++ developer, but I think the bare minimum just for CPU inference is one for each combination of:

That's why I didn't want to inflate the jar unnecessarily, since the risk of not supporting any particular system is too high anyway.

On the other hand, llama.cpp is really easy to build, so it shouldn't be too hard to setup a GitHub action. It would also greatly simplify the use of the library. The best option would be if there was a way to deliver platform-specific artifacts via Maven. But I don't think that's possible.

The resource folders containing the architecture-specific libraries need to be named a certain way though for JNA to find them: {os-name}-{arch}. The docs aren't super helpful on that, they imply on linux it should be linux-amd64 but I could only get it to work by naming the resource folder linux-x86-64

I think the approach is to use a helper class which decides which library to load. For whisper-jni this can be found here:

JNA even has a class built in for this as far as I know.

AutonomicPerfectionist commented 1 year ago

The best option would be if there was a way to deliver platform-specific artifacts via Maven. But I don't think that's possible.

The JavaCPP project uses Maven Classifiers to differentiate between platform-specific artifacts, so by default maven will pull in all platform artifacts but users are free to select specific ones to pull in. That does add significant complexity to the build process though.

JNA even has a class built in for this as far as I know.

Yep, it automatically searches the library and class paths for the requested library, either at the root of one of those paths, or under a platform-specific directory. So to get the platform detection for linux working, all I had to do was put the libllama.so in a resource folder called linux-x86-64, JNA found it automatically and loaded it, no need to use a helper class like you would with JNI

kherud commented 1 year ago

Ok, it is probably a good compromise to at least provide libraries for CPU inference and all major platforms, with a note in the readme for further GPU support.

I've never set up a GitHub action for this before. If you know about it, I'd appreciate the help, otherwise I'll take a look over the next few days.

AutonomicPerfectionist commented 1 year ago

I can try to help, yeah. I think starting with a base workflow from the JNI bindings for whisper.cpp would be a good idea:

https://github.com/GiviMAD/whisper-jni/blob/7f2a2532fc21cc6b4887977c14e7fafd9d655f08/.github/workflows/main.yml

I see you already have build scripts for linux and Mac, we'll need one for windows as well. From there, we need to modify the build scripts to place the built libraries in the proper resource directories and then make the workflow use those scripts. I'll do a bit of work on my end and see if I can come up with anything.