warpstreamlabs / bento

Fancy stream processing made operationally mundane. This repository is a fork of the original project before the license was changed.
https://warpstreamlabs.github.io/bento/
Other
1.06k stars 69 forks source link

Read bento config from environment variables #60

Open gregfurman opened 4 months ago

gregfurman commented 4 months ago

Currently, only the bento-lambda distribution allows for reading of bento config from the environment via the BENTO_CONFIG and BENTO_CONFIG_PATH variables.

https://github.com/warpstreamlabs/bento/blob/1d22b5d9e7f0214e3a074e191eec21ee6b7d9a35/internal/serverless/lambda/lambda.go#L31-L41

We should extend bento to include this functionality.


From Bento Lambda Setup | Passing in config from env Discord thread.

jem-davies commented 4 months ago

Added comment to discord thread:

What is your motivation for wanting to specify a bento config via an Environment Variable? - What is your use case and how come it is more convenient than a yaml file?

gregfurman commented 4 months ago

I'm no Lambda expert, but I can imagine it's easier to define and pass in config via an environment variable as opposed to packaging your config.yaml alongside that zip file

Also, that this is possible only with the lambda distro seems inconsistent IMO

jem-davies commented 4 months ago

Yeah I can see why it is in the lambda distribution and it perhaps makes sense to add it to the normal distribution too.

I was more curious as to in what circumstance they were wanting to do that for - presumably it is for something other than a lambda.

There might be other serverless frameworks where it would make sense to do so.

danielemery commented 4 months ago

Just bring some context in here from discord:

Let me know if I can provide any more details about our use-case!

jem-davies commented 4 months ago

@danielemery have you considered streams mode?

There is a rest api that can create streams dynamically: https://warpstreamlabs.github.io/bento/docs/guides/streams_mode/using_rest_api

not saying we shouldn't action the issue, just wondering is all.

jem-davies commented 4 months ago

OR can you use a volume mount to get the config file in there?

danielemery commented 4 months ago

@danielemery have you considered streams mode?

We didn't consider, and it looks really neat! However I'm not sure if would fully suit us since we want to be able to horizontally scale individual flows and we might have a hard time keeping track of which flow we have deployed in which container? We definitely will have a look into it though

OR can you use a volume mount to get the config file in there?

This is how we're doing things locally and it works great. I think on ECS it would be a bit cumbersome though, since we'd have to get the configs onto all the EC2 hosts when they start up to mount (or write to EFS volumes etc).

jem-davies commented 4 months ago

Yeah just scanning the docs for ECS and it seems that mounting files is a bit cumbersome.

I am aware that recently AWS introduced Mountpoint for Amazon S3

...

to be fair - I think that adding in the option of adding a config via an ENV VAR could be implemented easily, it just seems a bit weird to do that imo, we could add a new serverless distribution for ECS, probably more work but feel that it might make more sense for other users in the future.

danielemery commented 4 months ago

I am aware that recently AWS introduced Mountpoint for Amazon S3

Thanks for the tip, we weren't aware of this, and it doesn't seem to much of a stretch to have our coordinator write to S3 before creating a task definition. Still wouldn't be quite as clean as just providing the configuration inside the task definition 😉

to be fair - I think that adding in the option of adding a config via an ENV VAR could be implemented easily, it just seems a bit weird to do that imo, we could add a new serverless distribution for ECS, probably more work but feel that it might make more sense for other users in the future.

Would there be any other differences to a normal containerized Bento compared to the ECS serverless distribution? I think it would really only be the ability to read the config file from env?

jem-davies commented 4 months ago

Not sure - but if you want an ECS serverless distribution, then you could raise a separate issue and I will pick it up in the next couple of days.

Just from a semantic perspective, adding bento stream/pipeline config via a ENV VAR, seems to me a bit weird to me.

I would normally only expect ENV VARs to be small config strings, that from the perspective of a new user, that might seem weird outside of the context of lambda / ECS.

That currently because the ability to add config via ENV VAR seems to be a work around for lambdas, it makes more sense to add an ECS distribution rather than add this feature to the base distribution of bento, in my opinion.

chokosabe commented 3 months ago

Reading from ENV variables really should be the way to go on this. The arguments against (above) seem a bit contrived. Its a small change that simplifies deployment and there is no good arg against.

mbneimann-at-work commented 2 months ago

What is your motivation for wanting to specify a bento config via an Environment Variable? - What is your use case and how come it is more convenient than a yaml file?

I want to run bento in streams mode on edge nodes with the NATS Execution Engine called Nex. For that I would very much like to be able to provide the initial config via environment variables.

vordimous commented 3 weeks ago

I found what could be a decent workaround for people who need this feature.

I was having a similar issue with configuring containers with a single config file and came across this example of using an init container to simply write an env var value to a file and then mount it to the dependent container. The solution would depend heavily on the deployment config limitations for setting env variable contents. However, the idea should be easily replicated in whichever system you are working with.

        ContainerDefinitions:
        - Name: nginx
          Image: nginx
          Essential: true
          DependsOn:
          - Condition: COMPLETE
            ContainerName: nginx-config
          PortMappings:
            - ContainerPort: 80
          MountPoints:
            - ContainerPath: /etc/nginx
              SourceVolume: nginx-conf-vol
        - Name: nginx-config
          Image: bash
          Essential: false
          Command:
            - -c
            - echo $DATA | base64 -d - | tee /etc/nginx/nginx.conf
          Environment:
            - Name: DATA
              Value:
                Fn::Base64: |
                  events {
                    worker_connections  1024;
                  }

                  http {
                    server {
                      listen 80;
                      location / {
                        proxy_pass https://kichik.com;
                      }
                    }
                  }
          MountPoints:
            - ContainerPath: /etc/nginx
              SourceVolume: nginx-conf-vol

https://kichik.com/2020/09/10/mounting-configuration-files-in-fargate/