streamingfast / substreams

Powerful Blockchain streaming data engine, based on StreamingFast Firehose technology.
Apache License 2.0
164 stars 45 forks source link

Deployer interface for Substreams Services or Sinks #341

Closed abourget closed 10 months ago

abourget commented 12 months ago

Add in substreams:proto/sf/substreams/sink/service/v1/service.proto

message DeployRequest {
  sf.substreams.v1.Package substreams_package = 1;

  bool development_mode = 1;

  repeated EnvironmentVariable environ = 2;
}
message EnvironmentVariable {
  string key = 1;
  string value = 2;
}

Mutation of the alpha commands to:

substreams alpha service deploy
# and eventually:
substreams service deploy
substreams service info
substreams service list

From the SinkConfig objects, we remove all things that can be related to development. The goal of those Config objects is define the strict service we want in PRODUCTION environments. Which is the lowest denominator that all service providers MUST serve in order to be compliant with the service definition.

The development features are to be passed out-of-band with the development_mode flag and other environment variables.

Prod vs dev

If a provider doesn't support, it simple says so:

$ substreams service deploy -e deploy.pinax.network ./substreams.sql.yaml
PINAX: Sorry, the service `sf.substreams.sql.v1.Service` is not supported here.

When the development_mode == False:

When development_mode == True

substreams.clickhouse+dbt.yaml is reduced to its bare essentials:

sink:
  config:
    dbt:
      engine: clickhouse
      rest_frontend:
        enabled: true
      dbt:
        files: ./dbt
        run_interval_seconds: 3600

This is an example deployment locally:

By DEFAULT we deploy with --prod=false, which sets the development_mode = True

$ substreams service deploy ./substreams.clickhouse+dbt.yaml
LOCAL DOCKER: 
- Spinning up `clickhouse`
  - Exposing port 8125 (http clickhouse protocol)
  - Exposing port 5432 (pgsql compatible endpoint)
  - Exporing port 9000 (admin interface)
- Spinning up `rest_frontend`
- Spinning up `sql-sink`
- Spinning up `dbt`
  - Automatic DBT disabled. You can do one of two things: Set OOB var DBT_RUN=true if you still want to run it.
    1. Run `dbt` via Docker:

      export SET_PROFILE=bayc_dbt=clickhouse://asldkfj:alsdkfj@localhost:8135
      alias my_dbt=docker run -v ./dbt:/dbt/files -ti -e SET_PROFILE=$SET_PROFILE dbt-runner:latest run

    2. Or run locally:

    pip install dbt
    dbt setprofile bayc_dbt=clickhouse://asldkfj:alsdkfj@localhost:8135
    dbt run  # iterate here

With an explicit deployment to production:

$ substreams service deploy --prod ./substreams.clickhouse+dbt.yaml
LOCAL DOCKER: 
- Spinning up `clickhouse`
- Spinning up `rest_frontend`
- Spinning up `sql-sink`
- Spinning up `dbt`

Some deployment services could honor other variables to further refine the experience, either in dev or production:

$ substreams service deploy -e deploy.pinax.network -H CLICKHOUSE_OPEN_PORTS=false DBT_ENABLED=false ./substreams.clickhouse+dbt.yaml
PINAX: 
- Spinning up `clickhouse`
  - Not exposing clickhouse ports, as per `CLICKHOUSE_OPEN_PORTS`
- Spinning up `rest_frontend`
- Spinning up `sql-sink`
- Spinning up `dbt`
  - Automatic DBT disabled (DBT_ENABLED=false)
colindickson commented 11 months ago

Tasks so far:

sduchesneau commented 11 months ago

@abourget a few things to specify and discuss possibly:

The only things that are not production in our definition are: 1) wire_protocol 2) pgweb

I'm certain that some users WILL WANT wireprotocol access on their production environment, but we would either: a) tell them no (usually a bad idea) b) do it "out of band" -- they'll probably ask for it after the fact. But it could be a special magic "streamingfast..." environment variable that enables it. It wouldn't be part of the spkg definition. If it's in an environment variable, we'll need to support an "update" method on the deployment endpoint. an "update" on production may be refused if the modules or schema change, or other kind of business rules like that.

Comments ?

sduchesneau commented 11 months ago

Also, maybe some users will want DBT to run automatically in development.... It would require a more complex system, so I suggest we don't do any of that now: development means run your own (damn) dbt (from your workstation)

sduchesneau commented 11 months ago

@abourget

Another thing:

I don't believe that environment_variables name make sense anymore.

What about custom_parameters ? or just parameters ?

They are not environment variables, they are not used as such... it would be a bit like substreams params...

sduchesneau commented 11 months ago

chosen: deployment_parameters

it will contain either

todo:

colindickson commented 11 months ago