Closed asad-awadia closed 10 months ago
Tribuo's focus is on predictive systems, and our API doesn't have any good way of supporting generative tasks like language modelling, so we won't add support for LLaMA to Tribuo.
However we are working on expanded tokenization support as that is widely useful, so at some point we'll have pure Java sentencepiece and GPT tokenizers in addition to the existing wordpiece/BERT tokenizer we have.
To use an autoregressive language model at the moment on the JVM I'd recommend you look at ONNX Runtime, or DJL, both of which can run the models on GPUs. ONNX Runtime is lower level, but has examples of using LLaMA in Python which could be ported to Java. I'm working on some API improvements for ONNX Runtime in Java which will reduce the copying and speed things up, plus the next release will have fp16 support. DJL is maintained by Amazon, and they have pytorch and ONNX Runtime backends, both of which should support inference on a language model, and I think they have a worked GPT example.
Is your feature request related to a problem? Please describe. Meta released some amazing models, specifically llama-2-7b and codellama-7b and I am looking for a way to use them in the JVM
Tribuo might be a great place to provide these APIs to such models
Describe the solution you'd like
APIs provided to easily load the model and provide inference/generate methods
Describe alternatives you've considered
Tried using jllama and lama4j but no successful runs
Additional context
Models can be found at : https://huggingface.co/meta-llama