dan-homebrew commented 1 month ago

Goal

We need a Python runtime to run TTS library
There is an ever-increasing focus on Python for inference - how do we align?
Blocks #1247

Tasklist

[ ] Architecture question: how do we support Python as part of Cortex?

Previous Discussions

Python as a separate process (i.e. we just throw an error message, expect user to already have Python installed?)
Or: is there a way for us to package Python?

nguyenhoangthuan99 commented 1 month ago

cortex.python Integration Architecture

Overview

This document outlines the architecture for integrating Python functionality into a C++ application, specifically for running machine learning models. The system uses a proxy approach to connect the C++ application (cortex.cpp) with Python processes, allowing for isolated environments for different models.

Architecture Diagram

Key Components

cortex.cpp: The main C++ application.
cortex.python: A proxy engine that connects cortex.cpp with Python processes.
Python Processes: Separate processes spawned for each model execution.
Virtual Environments: Isolated Python environments for each model.

Folder Structure

cortexcpp/
├── models/
│           cortexso/
│           └── python/
│               └── whisper/
│                     ├── model-binary.pth 
│                     ├── whisper.py
│                     ├── main.py
│                     └── requirements.txt       
├── engines/
│   ├── cortex.llamacpp/
│   └── cortex.python/
│       ├── libengine.so  # proxy interface for python model and cortex.cpp
│       └── venv/         # Virtual environments
│           ├── whisper/
│           │   ├── lib/     #  python libraries and dependencies for whisper
│           │   └── bin/
│           │           └─ python3.12 # executable python for whisper
│           ├── fish-speech/
│           └── vision/

Processes

Model Pulling

Request from cortex.cpp to cortex.python
create virtual environment
Pull python for created virtual environment.
Pull code, model from cortexso.
Install dependencies to virtual environment: /path/to/venv/bin/python -m pip install -r requirements.txt

The model pulling step also needs to install the engine for running python model. engine or backend for python model is all libs and deps inside virtual environment.

Model Execution

Request from cortex.cpp to cortex.python
cortex.python spawns a new process
Run main.py in the appropriate virtual environment/ engine/ backend.

Chat Functionality

Request from cortex.cpp to cortex.python
cortex.python communicates with the Python process via WebSocket, Unix Domain Socket, or similar

Implementation Details

Python Interface

Create an abstract Python interface to wrap the inference logic for communication with cortex.cpp
Implement a predict function (or similar) for each model's specific inference logic

Virtual Environments

Each model has its own virtual environment to avoid dependency conflicts
Virtual environments are created and managed by cortex.python

Packaged Python

Python installation is packaged with the cortex.python engine
Users don't need to install Python separately

Model Execution

Each model runs in its own process for isolation
main.py is the entry point for each model
sys.path can be modified to locate model-specific modules (e.g., whisper.py)

dan-homebrew commented 1 month ago

@nguyenhoangthuan99 @vansangpfiev @namchuai I would like to raise a concern here, and propose a (possibly incorrect) alternative:

Engines as 1st class Citizens of Cortex

Engines are 1st-class objects in Cortex, e.g. llama.cpp, Whisper, Fish Speech etc
Engines package their dependencies, e.g. CUDA, or Python, or whatever else

This has the following benefits:

Each "engine" can define its own set of Python dependencies
Python dependencies have traditionally been hell, and I anticipate a lot of incompatibility issues between engines
We can define a clear Engine interface (this is already being used by a couple of engineers in Discord)

How this would work

We focus on our Engines interface, and define a way to package a Python runtime and dependencies
Each Engine is firewalled from other engines, and maintains its own Python version, dependencies, etc
In the future, we can add optimizations (i.e. shared Python versions)

prabirshrestha commented 1 week ago

I would suggest using uv. It is extremely fast Python package and project manager, written in Rust.

Then you can even do something like this.

uv run --with mlx-lm \
  mlx_lm.generate \
  --model mlx-community/Qwen2.5-Coder-32B-Instruct-8bit \
  --max-tokens 4000 \
  --prompt 'write me a python function that renders a mandelbrot fractal as wide as the current terminal'

There are lot of python projects related to LLM, so being able to use those packages directly can easily help.

These might be of interest:

janhq / cortex.cpp

epic: Cortex.cpp to support Python? #1353

Goal

Tasklist

Previous Discussions

cortex.python Integration Architecture

Overview

Architecture Diagram

Key Components

Folder Structure

Processes

Model Pulling

Model Execution

Chat Functionality

Implementation Details

Python Interface

Virtual Environments

Packaged Python

Model Execution

Engines as 1st class Citizens of Cortex

How this would work