twelvelabs-io / tl-jockey

Jockey is a conversational video agent.
51 stars 12 forks source link

Jockey

Jockey is a conversational video agent designed for complex video workflows. It combines the following technologies:

This allows Jockey to perform accurate video operations based on natural language instructions.

NOTE: Jockey is currently in alpha development. It may be unstable or behave unexpectedly. Use caution when implementing Jockey in production environments.

Key Features

Use Cases

Use cases include but are not limited to the following:

Prerequisites

Ensure the following prerequisites are met before installing and using Jockey.

System Prerequisites

Software Prerequisites

Additional Prerequisites

Installation

This section guides you through the process of installing Jockey on your system. Please ensure all the prerequisites are met before proceeding with the installation. If you encounter any issues, please refer to the Troubleshooting page or reach out on the Multimodal Minds Discord server for assistance.

Clone the Repository

Open a terminal, navigate to the directory where you want to install Jockey, and enter the following command:

git clone https://github.com/twelvelabs-io/tl-jockey.git

Set Up a Python Virtual Environment

  1. Create a new virtual environment:
    cd tl-jockey && python3 -m venv venv
  2. Activate your virtual environment:
    source venv/bin/activate
  3. (Optional) Verify that your virtual environment is activated:
    echo $VIRTUAL_ENV

    The output should display the path to your virtual environment directory, as shown in the example below:

    /Users/tl/jockey/tl-jockey/venv

    This indicates that your virtual environment is activated. Your virtual environment is not activated if you see an empty line. If this check indicates that your virtual environment is not activated, activate it using the source venv/bin/activate command.

Install Python Dependencies

Install the required Python packages:

pip3 install -r requirements.txt

Configuration

Jockey uses environment variables for configuration, and comes with an example.env file to help you get started.

  1. In the tl-jockey directory, copy the example.env file to a new file named .env:
  2. Open the newly created .env file in a text editor.
  3. Replace the placeholders with your actual values. See the tables below for details.

Common variables

Variable Description Example
LANGSMITH_API_KEY Your Langgraph-sdk API key. lsv2_...
TWELVE_LABS_API_KEY Your Twelve Labs API key. tlk_987654321
LLM_PROVIDER The LLM provider you wish to use. Possible values are AZURE and OPENAI. AZURE
HOST_PUBLIC_DIR Directory for storing rendered videos ./output
HOST_VECTOR_DB_DIR Directory for vector database storage ./vector_db

LLM provider-specific variables

For Azure OpenAI:

Variable Description Example
AZURE_OPENAI_ENDPOINT Your Azure OpenAI endpoint URL https://your-resource-name.openai.azure.com/
AZURE_OPENAI_API_KEY Your Azure OpenAI API key 987654321
AZURE_OPENAI_API_VERSION The API version you're using 2023-12-01-preview

For OpenAI:

Variable Description Example
OPENAI_API_KEY Your OpenAI API key 987654321

Usage

This section provides instructions on how to deploy and use Jockey. Note that Jockey supports the following deployment options:

This document covers the terminal-based deployment. If you're a developer looking to integrate Jockey into your application, see the Deploy and Use Jockey with the LangGraph API Server

Deploy and use Jockey in the terminal

The terminal deployment is ideal for quick testing, development work, and debugging. It provides immediate feedback and allows for easy interaction with Jockey.

Terminal Example Jockey Video Walkthrough

  1. Activate your virtual environment:
    source venv/bin/activate
  2. Run the following command:
    python3 -m jockey terminal

    Jockey Terminal Startup

  3. Jockey will initialize and display a startup message. Wait for the prompt indicating it's ready for input.
  4. Once Jockey is ready, you can start interacting with it using natural language commands. Begin by providing and index id in your initial prompt, as shown in the example below:
    Use index 65f747a50db0463b8996bde2. I'm trying to create a funny video focusing on Gordon Ramsay. Can you find 3 clips of Gordon yelling at his chefs about scrambled eggs and then a final clip where Gordon bangs his head on a table. After you find all those clips, lets edit them together into one video.

    Note that in some cases, such as summarizing videos or generating chapters and highlights, you must also provide a video ID. You can continue the conversation by providing new instructions or asking questions, as shown in the following example:

    This is awesome but the last clip is too long. Lets shorten the last clip where Gordon hits his head on the table by making it start one second later. Then combine all the clips into a single video again.
  5. When you've finished, exit terminal mode using the Ctrl+C keyboard shortcut.

Debug in the Terminal

The terminal version of Jockey provides verbose output for debugging purposes:

Jockey Terminal Debugging Example

To adjust the verbosity of the output, modify the parse_langchain_events_terminal() function in jockey/util.py.

Note that the tags for the individual components are set in app.py.

Integrate Jockey Into Your Application

To integrate Jockey into your application, use an HTTP client library or the LangGraph Python SDK.

For a basic example of how to interact with Jockey programmatically, refer to the client.ipynb Jupyter notebook in the project repository. For more detailed information, see the LangGraph Examples page.

Additional Documentation