rename current infer implementation for LlamaAI and create a new implementation based only on "max_tokes" that includes prompt+generation

laelhalawani / gguf_llama

Wrapper for simplified use of Llama2 GGUF quantized models.

https://pypi.org/project/gguf_llama

Other

5 stars 1 forks source link

rename current infer implementation for LlamaAI and create a new implementation based only on "max_tokes" that includes prompt+generation #6

Closed laelhalawani closed 10 months ago

laelhalawani commented 10 months ago

The current method was used to optimize model for batched work and while it can be useful to have these guardrails for specific cases, it's not adding simplicity to use of the model

laelhalawani commented 10 months ago

done