Latest news :fire:
continuous batching
This library provides reimplemented blocks of LLMs which are used to make the models functional and highly performant on Qualcomm Cloud AI 100. There are several models which can be directly transformed from a pre-trained original form to a deployment ready optimized form. For other models, there is comprehensive documentation to inspire upon the changes needed and How-To(s).
ONNX
Graph.It is mandatory for each Pull Request to include tests such as:
# Create Python virtual env and activate it. (Required Python 3.8)
python3.8 -m venv qeff_env
source qeff_env/bin/activate
pip install -U pip
# Clone and Install the QEfficient Repo.
pip install git+https://github.com/quic/efficient-transformers --extra-index-url https://download.pytorch.org/whl/cpu
# Or build wheel package using the below command.
pip install build wheel
python -m build --wheel --outdir dist
pip install dist/QEfficient-0.0.1.dev0-py3-none-any.whl --extra-index-url https://download.pytorch.org/whl/cpu
For more details about using QEfficient
via Cloud AI 100 Apps SDK, visit Linux Installation Guide
Note: More details are here: https://quic.github.io/cloud-ai-sdk-pages/latest/Getting-Started/Model-Architecture-Support/Large-Language-Models/llm/
Thanks to:
If you run into any problems with the code, please file Github issues directly to this repo.
This project welcomes contributions and suggestions. Please check the License. Integration with a CLA Bot is underway.