oracle / tribuo

Tribuo - A Java machine learning library
https://tribuo.org
Apache License 2.0
1.24k stars 172 forks source link

Llama APIs #348

Closed asad-awadia closed 10 months ago

asad-awadia commented 10 months ago

Is your feature request related to a problem? Please describe. Meta released some amazing models, specifically llama-2-7b and codellama-7b and I am looking for a way to use them in the JVM

Tribuo might be a great place to provide these APIs to such models

Describe the solution you'd like

APIs provided to easily load the model and provide inference/generate methods

Describe alternatives you've considered

Tried using jllama and lama4j but no successful runs

Additional context

Models can be found at : https://huggingface.co/meta-llama

Craigacp commented 10 months ago

Tribuo's focus is on predictive systems, and our API doesn't have any good way of supporting generative tasks like language modelling, so we won't add support for LLaMA to Tribuo.

However we are working on expanded tokenization support as that is widely useful, so at some point we'll have pure Java sentencepiece and GPT tokenizers in addition to the existing wordpiece/BERT tokenizer we have.

To use an autoregressive language model at the moment on the JVM I'd recommend you look at ONNX Runtime, or DJL, both of which can run the models on GPUs. ONNX Runtime is lower level, but has examples of using LLaMA in Python which could be ported to Java. I'm working on some API improvements for ONNX Runtime in Java which will reduce the copying and speed things up, plus the next release will have fp16 support. DJL is maintained by Amazon, and they have pytorch and ONNX Runtime backends, both of which should support inference on a language model, and I think they have a worked GPT example.