wildlife-dynamics / ecoscope-workflows

An extensible task specification and compiler for local and distributed workflows.
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

Research: Storage and materialization of user-defined DataConnection secrets on Cloud Run #48

Open cisaacstern opened 5 days ago

cisaacstern commented 5 days ago

Per @atmorling's comment https://github.com/wildlife-dynamics/ecoscope-workflows/pull/31#issuecomment-2186022263, we should research the best way to store and materialize data connection config (inclusive of secrets) inside Cloud Run invocations.

The Google Secrets Manager (GSM) reference implementation in #31 is one possible path, but questions remain regarding how to ensure per-user materialization of config in Cloud Run, as well as if/how managing 100s or 1000s of user-specific secrets will scale with GSM as opposed to some other approach (encrypted database, etc.).

So to summarize, the main questions are:

  1. What storage backend to use for config on GCP that will scale well for eventually managing 1000s of per-user configs/secrets
  2. How to materialize those inside a Cloud Run invocation for a specific user (i.e., the service account that calls the Cloud Run function will presumably be our operational service account, not a user-specific account, so how do we differentiate the identity of the user who has requested this invocation in a robust and secure way?)
atmorling commented 5 days ago

I've thrown the below together mostly just to help me think about this

sequenceDiagram
    actor User
    participant WebUI as Web UI
    participant EcoscopeServer as Ecoscope Server
    participant EcoscopeWorkflows as Ecoscope Workflows CLI Service (Process, Serverless)
    participant SecureStore as GSM?
    participant WorkflowExecutor as Workflows Executor API (Airflow, Serverless)
    participant ThirdPartyDataService as 3rd Party Data Service (EarthRanger)

    User->>WebUI: Fill Form for First Config Block
    WebUI->>EcoscopeServer: Post First Config Block
    EcoscopeServer-->>WebUI: Show Next Config Block
    WebUI-->>User: Show Next Config Block

    User->>WebUI: Fill Form for Data Connection
    WebUI->>EcoscopeServer: Post Config

    EcoscopeServer->>EcoscopeWorkflows: Create Data Connection
    EcoscopeWorkflows->>SecureStore: Store data connection config
    SecureStore-->>EcoscopeWorkflows: Config stored
    EcoscopeWorkflows-->>EcoscopeServer: Data Connection created

    EcoscopeWorkflows->>ThirdPartyDataService: Get Table Schema
    ThirdPartyDataService-->>EcoscopeWorkflows: Table Schema
    EcoscopeWorkflows-->>EcoscopeServer: Table Schema

    EcoscopeServer-->>WebUI: Show Next Config Block
    WebUI-->>User: Show Next Config Block

    Loop Continue config
    User->WebUI: Config continues
    WebUI-->>User: 
    end

    User->>WebUI: Run "Patrols Example"
    WebUI->>EcoscopeServer: Run "Patrols Example" (knows executor)
    EcoscopeServer->>WorkflowExecutor: Run "Patrols Example"
    WorkflowExecutor-->>EcoscopeServer: Run started
    EcoscopeServer-->>WebUI: Status "Pending"
    WebUI-->>User: Status "Pending"
    WorkflowExecutor->WorkflowExecutor: Execute tasks
    WorkflowExecutor->>EcoscopeServer: Get Data Connection config
    EcoscopeServer-->>WorkflowExecutor: Return data connection with location of credentials
    WorkflowExecutor->>SecureStore: Get Data Connection credentials
    SecureStore-->>WorkflowExecutor: credentials
    WorkflowExecutor->>ThirdPartyDataService: get_patrol_observations
    ThirdPartyDataService-->>WorkflowExecutor: patrol observations