Closed Maknee closed 1 year ago
Thanks! Left a couple comments. To fix the code style, run
black .
from the project root, it'll autoformat the code.
Thanks! Formatted code using black
Merging this into a working branch; I spent a bit of time looking more into the LLaMA tokenizer and I think some of the non-strict tokenization logic can be cleaned up and moved to a common file.
kani/engines/ctransformers
that includes two classes,CTransformersEngine
andLlamaCTransformersEngine
, which is an implementation ofCTransformersEngine
for the llama 2 modelsctransformers
as an optional installation topyproject.toml
/docs/shared/engine_table.rst
and/docs/engine_reference.rst
examples
folder4_engines_ggml_llama.py
provides an example of using the llama model with a chat interfaceSlightly offtopic,: Using the free resources generously provided by HuggingFace, I created a gradio chat demo using kani's
LlamaCTransformersEngine
on HuggingFace spaces. However, it is somewhat slow since at this time as the HuggingFace CPU basic tier (free tier) provides only 2 VCPU resources and low memory.