marimo-team / marimo

A reactive notebook for Python — run reproducible experiments, execute as a script, deploy as an app, and version with git.
https://marimo.io
Apache License 2.0
8.18k stars 298 forks source link

Support stateless for multi container scaling and deployment #1831

Open sherodtaylor opened 4 months ago

sherodtaylor commented 4 months ago

Description

I have a custom kubernetes deployment that requires stateless applications which has been a standard for a long time.

Suggested solution

I'd like to be able to deploy the app and any state necessary be offloaded to redis or refactoring to support stateless applications which have been a standard for a long time i.e. 12 factor apps

Alternative

Our internal kubernetes system doesn't support sticky sessions.

Additional context

error received:

  File "/bb/libexec/workflow-metrics-notebooks/python/lib/python3.10/site-packages/marimo/_server/api/deps.py", line 135, in require_current_session
    raise ValueError(f"Invalid session id: {session_id}")
ValueError: Invalid session id: s_p41hf3
import argparse
import logging
from typing import Any

import marimo
from fastapi import FastAPI
import uvicorn

LOG_LEVELS = ["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"]

def parse_args(args: list[str] | None = None) -> dict[str, Any]:
    parser = argparse.ArgumentParser()
    parser.add_argument("--log-file", "-l")
    parser.add_argument("--host", default="0.0.0.0")
    parser.add_argument("--log-level", default="INFO", choices=LOG_LEVELS)
    parser.add_argument("--dir", default="./notebooks")
    parsed_args, unknown = parser.parse_known_args(args=args)
    logging.info(f'unknown args - {unknown}')
    return vars(parsed_args)

def start_server(host: str, dir: str) -> None:

    server = (
        marimo.create_asgi_app()
        .with_app(path="", root=f'{dir}/workflow-metrics.py')
    )

    # Create a FastAPI app
    app = FastAPI(host=host)
    app.mount("/", server.build())

    uvicorn.run(app, host=host, port=8080)

def main() -> None:
    args = parse_args()
    start_server(args.pop("host"), args.pop("dir"))

if __name__ == "__main__":
    main()
akshayka commented 4 months ago

Copying a response from Discord, in case others are also interested:

This will be difficult since not everything is serializable so can’t be stateless. These are running programs that inherently have state. Even if we made marimo stateless, your code may not be (e.g threads, db connections, etc)

I think a better request would be a load balancer that can manage multiple instances.

Based on what you know about marimo, if you have suggestions on how marimo might one day support stateless execution, we're open to hearing them.

sherodtaylor commented 2 months ago

@akshayka one model you can follow is how Airflow serializes it's objects. It would be useful for deployed programs as programs are built to be stateless.

https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/serializers.html

akshayka commented 2 months ago

Thanks for the link. We'll likely look into this one day (perhaps we could patch globals() to hit an external cache) — I appreciate how this would make horizontal scaling very easy — but it's not on our short-term roadmap.