pinecone-io / canopy

Retrieval Augmented Generation (RAG) framework and context engine powered by Pinecone
https://www.pinecone.io/
Apache License 2.0
976 stars 121 forks source link

Add dockerfile #234

Closed izellevy closed 11 months ago

izellevy commented 11 months ago

Problem

Build/deploy of canopy is not streamlined.

Solution

Add a dockerfile to host canopy.

Explanation

This Dockerfile defines a multi-stage build process for a Python application using Poetry for dependency management. Let's break down each stage and understand what it does:

Stage 1: Python Base

This stage sets up the shared environment variables and creates a virtual environment using the official Python 3.11.7-slim image. Key environment variables include settings for Python, pip, and Poetry. The PYTHONPATH is also configured. The virtual environment is created at /venv.

Stage 2: Builder Base

This stage extends the Python Base stage and adds tools necessary for building dependencies. It installs additional packages using apt-get such as build essentials, Git, Vim, and other dependencies. Poetry is installed, and the build cache is utilized to speed up the build process. It sets the working directory to /app and copies pyproject.toml and poetry.lock to install dependencies.

Stage 3: Development

This stage is used during development/testing. It extends the Builder Base stage and copies the project files. It installs development dependencies, and the CMD is set to run a Bash shell.

Stage 4: Production

This is the final stage used for runtime. It extends the Python Base stage and copies the built Poetry and virtual environment from the Builder Base stage. It installs runtime dependencies, copies the project files, and sets up the environment for production. The application is exposed on port 8000, and Gunicorn is configured as the entry point to run the application.

_Optimizations and Considerations:_

Multi-stage Build:

The use of multi-stage builds helps to keep the final image small by discarding unnecessary build dependencies. The production image only includes the necessary artifacts for runtime, reducing its size.

Build Cache:

Caching is used effectively during the installation of dependencies to speed up the build process. Poetry and pip caches are stored in the build cache directory to allow reuse.

Virtual Environment:

A virtual environment is used for dependency isolation. The virtual environment is created in a separate stage to ensure a clean environment and is then copied to the production image.

Dependency Caching Optimization

As part of the Dockerfile, the optimization of installing dependencies with the --no-root option is employed. This is mentioned in both the Builder Base and Development stages. The --no-root option allows for the caching of dependencies, as they are installed in a location that is not the final root of the system. This can significantly speed up subsequent builds, as the dependencies are cached separately from the application code.

Here's the specific part in the Dockerfile where this optimization is implemented:

Builder Base Stage

RUN --mount=type=cache,target=/root/.cache \ poetry install --no-root --all-extras --only main

Development Stage

RUN --mount=type=cache,target=/root/.cache \ poetry install --no-root --all-extras --with dev

Type of Change

Test Plan

Describe specific steps for validating this change.