Open ppavlov39 opened 2 years ago
Would environment variables help you? The config supports interpolation from env vars for fields.
Would environment variables help you? The config supports interpolation from env vars for fields.
Thanks for the answer.
Unfortunately no. We are already using environment variables to set some parameters. But now we need to make a choice which input benthos to use. Environment variables do this perfectly. But if we have multiple streams, we must prepare two input component configs for each streams. And if we have some differences in the output section, we also need to have two configurations for each case, because Benthos initializes input and output before it process any mappings in variables. Such config becomes very confusing.
If we could do some variable handling before initializing the input and output components, using environment variables, that would be great.
Oops, I somehow missed your reply. I wonder if yaml anchors and aliases would help you.
Otherwise, one hack that comes to mind is to use dynamic
inputs and outputs
in your streams and then have Benthos craft configs for them in a manager stream and post them to the appropriate worker stream via the embedded REST API using the http_server
output. That might be a bit convoluted, so not sure you want to go down that path. Alternatively, if you're only interested in a few fields from certain inputs, we could enhance them to support interpolation and then you could use the bloblang env
function to produce their value.
What do you think of this idea for config variables? This isn't based on any personal needs; just musings…
variables:
version: |
root = env("GIT_SHA") || "unknown"
topic: |
root = "%s_myconfig".format(env("BASE_TOPIC_NAME"))
inputs:
gcp_pubsub:
topic: ${! var("topic") }
processors:
- mapping: |
meta pipeline_version = var("version")
Benefit is that you have access to write full bloblang mappings that yield a string rather than noisy interpolated strings. You will still use interpolated strings to reference variables and you can also refer to them in other mappings/mutations.
It will also be possible to define variables in resource files so they're shareable between multiple configs.
Cant one use the cache for a global variable context?
@mannharleen not if you want those variables to configure inputs or outputs. Technically messages can carry config over to outputs (in metadata for example) but that's messy if the config is unrelated to the message in any way i.e. you're sideloading config to messages.
Also getting config from caches will require messy branch
processors.
That's true. However, I was alluding to having a cache
function available in bloblang. Which then makes it viable to configure inputs or outputs without messy branching.
I am for the motion for having a global var context; I just jumped into an alternative solution the mode.
Sorry for not replying for so long. Thanks for answers.
Oops, I somehow missed your reply. I wonder if yaml anchors and aliases would help you.
Otherwise, one hack that comes to mind is to use
dynamic
inputs andoutputs
in your streams and then have Benthos craft configs for them in a manager stream and post them to the appropriate worker stream via the embedded REST API using thehttp_server
output. That might be a bit convoluted, so not sure you want to go down that path. Alternatively, if you're only interested in a few fields from certain inputs, we could enhance them to support interpolation and then you could use the bloblangenv
function to produce their value.
I thought about yaml anchors, but they can't solve the main problem, the config would still be too confusing, but shorter. Dynamic configuration is not applicable in my situation because the service is using in K8s and should only be controlled via configs and environment variables.
What do you think of this idea for config variables? This isn't based on any personal needs; just musings…
variables: version: | root = env("GIT_SHA") || "unknown" topic: | root = "%s_myconfig".format(env("BASE_TOPIC_NAME")) inputs: gcp_pubsub: topic: ${! var("topic") } processors: - mapping: | meta pipeline_version = var("version")
Benefit is that you have access to write full bloblang mappings that yield a string rather than noisy interpolated strings. You will still use interpolated strings to reference variables and you can also refer to them in other mappings/mutations.
It will also be possible to define variables in resource files so they're shareable between multiple configs.
I think it is a great idea to implement variable parsing in the first step of config processing. This will solve the problem, I think.
When will global variables be supported? The current configuration is too cumbersome. I am not a data processing user. I am a home automation user. I mostly use mqtt/http-restapi/redis. I really like some of the benthos designs, the input and output plugins are almost perfect. But the configuration is too painful. Very inflexible. Lots of repetitive typing.
I recommend two structured libraries for yaml. Can be used to normalize yaml templates https://github.com/mandelsoft/spiff https://github.com/vmware-tanzu/carvel-ytt Personally, I think that if benthos does data analysis based on the spiff library, it is better than the current practice.
I think yaml can be structured based on spiff. If there is output in other text formats, you can use yaml templated output, such as https://github.com/subchen/frep https://github.com/mmalcek/bafi
@darcyg we haven’t prioritised this issue yet. If you want something more convenient to work with than YAML then consider using CUE which Benthos supports. With CUE, you’ll have type safe configs and the ability to reuse values to cut down your config.
Hello! We use benthos in streaming mode and processing several data streams. Is there a way to set a variable (or something like that) that we can use to configure an input parameters in a stream? We can set some value in meta-information, but it can't be used in input before we got a first message from input.
As example, in mongodb input the most of config is common between streams, but collection name must be specific to every stream.