rajatkrishna / chat-llama3

Local LLama3 Inference with OpenVINO
MIT License
2 stars 1 forks source link
chat llama3 openvino

Local Llama 3 Chat 🦙

A simple chat application with LLama 3 using OpenVINO Runtime for inference and transformers library for tokenization.


Model Export

Download the INT-4 quantized Meta-Llama-3.1-8B-Instruct model already converted to the OpenVINO IR format from HuggingFace using huggingface-cli with the following command:

huggingface-cli download rajatkrishna/Meta-Llama-3.1-8b-Instruct-OpenVINO-INT4 --local-dir models/llama-3.1-instruct-8b

Quickstart with Docker

Requirements

Getting Started

  1. Clone the repository

    git clone https://github.com/rajatkrishna/llama3-openvino
  2. Create a new virtual environment to avoid dependency conflicts:

    python3 -m venv create .env
    source .env/bin/activate
  3. Install the dependencies in requirements.txt

    pip install -r requirements.txt
  4. Start the flask server from the project root using

    python3 -m flask run

Export from HuggingFace