Add CTransformers engine

Maknee commented 1 year ago

This adds the CTransformers module to kani
- CTransformers is a python library that has bindings for models implemented in GGML
- This PR follows the guidelines in contributing section of docs
- Adds a new engine under kani/engines/ctransformers that includes two classes, CTransformersEngine and LlamaCTransformersEngine, which is an implementation of CTransformersEngine for the llama 2 models
- Adds ctransformers as an optional installation to pyproject.toml
- Adds docs to /docs/shared/engine_table.rst and /docs/engine_reference.rst
- In addition, this PR provides an example for using the llama v2 model in the examples folder
- 4_engines_ggml_llama.py provides an example of using the llama model with a chat interface

Slightly offtopic,: Using the free resources generously provided by HuggingFace, I created a gradio chat demo using kani's LlamaCTransformersEngine on HuggingFace spaces. However, it is somewhat slow since at this time as the HuggingFace CPU basic tier (free tier) provides only 2 VCPU resources and low memory.

Maknee commented 1 year ago

Thanks! Left a couple comments. To fix the code style, run black . from the project root, it'll autoformat the code.

Thanks! Formatted code using black

zhudotexe commented 1 year ago

Merging this into a working branch; I spent a bit of time looking more into the LLaMA tokenizer and I think some of the non-strict tokenization logic can be cleaned up and moved to a common file.

zhudotexe / kani

Add CTransformers engine #1