mozilla-ai / lumigator

Source code for Mozilla.ai's Lumigator platform
Apache License 2.0
5 stars 0 forks source link

Mozilla.ai Lumigator

Lumigator is an open-source platform built by Mozilla.ai for guiding users through the process of selecting the right language model for their needs. Currently, we support evaluating summarization tasks using sequence-to-sequence models like BART and BERT and causal architectures like GPT and Mistral, but will be expanding to other machine learning tasks and use-cases.

See example notebook for a platform API walkthrough.

[!NOTE]

Lumigator is in the early stages of development. It is missing important features and documentation. You should expect breaking changes in the core interfaces and configuration structures as development continues.

Docs

Available Machine Learning Tasks

Summarization

Models for Online Ground Truth Generation

Model Type Model via HuggingFace via API
seq2seq facebook/bart-large-cnn X
causal gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo-0125 X
causal open-mistral-7b X

Models for Offline Evaluation

Model Type Model via HuggingFace via API
seq2seq facebook/bart-large-cnn X
seq2seq longformer-qmsum-meeting-summarization X
seq2seq mrm8488/t5-base-finetuned-summarize-news X
seq2seq Falconsai/text_summarization X
causal gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo-0125 X
causal open-mistral-7b X

Metrics

Check this link for a list of pros and cons of each metric and an example of how they work

Technical Overview

Lumigator is a Python-based FastAPI web app with REST API endpoints that allow for access to services for serving and evaluating large language models available as safetensor artifacts hosted on both HuggingFace and local stores, with our first primary focus being Huggingface access, and tracking the lifecycle of a model in the backend database (Postgres). It consists of:

Get Started

You can build the local project using pants and docker-compose on Mac or Linux, or into a distributed environment using Kubernetes Helm charts

Local Requirements

Local Development Setup (either Mac or Linux)

  1. git clone git@github.com:mozilla-ai/lumigator.git
  2. Install Pants using the official instructions for your system. For more on using Pants, read the Pants guide.
  3. make bootstrap-dev-environment and source mzaivenv/bin/activate to activate the virtualenv.
  4. make local-up. For more on docker-compose, see the local install documentation..
  5. To shut down app, make local-down and deactivateto deactivate the virtualenv

Dev Environment Details

This includes a standalone python interpreter, venv (mzaivenv), precommit configs, and more. Python setup is handled by uv; pants maintains lockfiles for different platforms. Currently, only python 3.11.9 is valid for this project; if a compatible interpreter is found uv will not download a standalone python interpreter for you.

For VSCode users, activate the venv before opening your IDE; the .env file will be recognized automatically.

make bootstrap-dev-environment
source mzaivenv/bin/activate

Show targets:

make show-pants-targets

run the app locally via docker compose:

make local-up
make local-logs # gets the logs from docker compose
make local-down # shuts it down

Compile targets manually:

pants package <target>
# backend app
pants package lumigator/python/mzai/backend --no-local-cache
# backend docker image
pants package lumigator/python/mzai/backend:backend_image

Environment variable reference

The docker-compose setup described in the corresponding README needs several environment variables to work appropriately.

If the S3 storage service is used, the endpoint, key and secret are needed. The LocalStack implementation used also requires an authentication token.

Environment variable name Default value Description
LOCAL_FSSPEC_S3_ENDPOINT_URL "" Endpoint URL for the S3 data storage service
LOCAL_FSSPEC_S3_KEY "" Key for the S3 data storage service
LOCAL_FSSPEC_S3_SECRET "" Secret for the S3 data storage service
LOCALSTACK_AUTH_TOKEN "" Authentication token for the LocalStack service

Models from Mistral or OpenAI can be used via API instead of instantiating them within Lumigator. In this case, the corresponding key is needed.

Environment variable name Default value Description
MISTRAL_API_KEY "" Key for Mistral API models
OPENAI_API_KEY "" Key for OpenAI API models

Rebuilding dependencies

You may need to manually regenerate the lockfiles if you update dependencies. To do so:

  1. Add your new dependency to 3rdparty/python/pyproject.toml. This file respects system platform markers, and only very special cases need to be added as explicit python_requirement targets.
  2. run pants generate-lockfiles. This will take a while - 5-10 minutes in some cases and require access to pypi.

make sure to add the new lockfiles to the repo with your PR. You'll have to rebuild your dev environment if you haven't already.