Llama APIs - Githubissues

Tribuo's focus is on predictive systems, and our API doesn't have any good way of supporting generative tasks like language modelling, so we won't add support for LLaMA to Tribuo.

However we are working on expanded tokenization support as that is widely useful, so at some point we'll have pure Java sentencepiece and GPT tokenizers in addition to the existing wordpiece/BERT tokenizer we have.

To use an autoregressive language model at the moment on the JVM I'd recommend you look at ONNX Runtime, or DJL, both of which can run the models on GPUs. ONNX Runtime is lower level, but has examples of using LLaMA in Python which could be ported to Java. I'm working on some API improvements for ONNX Runtime in Java which will reduce the copying and speed things up, plus the next release will have fp16 support. DJL is maintained by Amazon, and they have pytorch and ONNX Runtime backends, both of which should support inference on a language model, and I think they have a worked GPT example.

oracle / tribuo

Llama APIs #348