opensearch-project / data-prepper

Data Prepper is a component of the OpenSearch project that accepts, filters, transforms, enriches, and routes data at scale.
https://opensearch.org/docs/latest/clients/data-prepper/index/
Apache License 2.0
256 stars 188 forks source link

Support JSON configuration files #1020

Open graytaylor0 opened 2 years ago

graytaylor0 commented 2 years ago

Is your feature request related to a problem? Please describe. As a user that prefers JSON over YAML, I would like the option to configure Data Prepper using a JSON configuration file.

Describe the solution you'd like Data Prepper supports both YAML and JSON configuration files

Additional context This is a relatively minor change, as Jackson can easily convert JSON to YAML internally. However, it may be necessary to provide JSON configuration examples along with the YAML examples in the documentation. However, since YAML is the main way to configure Data Prepper, this may not be completely necessary.

Is there a concern that having 50% of users on YAML and 50% on JSON would cause more confusion that necessary? Or does the option to use JSON outweigh this concern?

Tasks

dlvenable commented 2 years ago

My main concern is that this could cause additional confusion for pipeline authors. They now to have an initial decision to make on which format to use. And they may wonder if there are advantages to one over the other.

I hope to hear feedback from other pipeline authors on how important this is. This is the kind of issue I'd like to evaluate based on feedback, perhaps up-votes.

I don't think this improves the ease of use for Data Prepper either way since it adds some small friction while setting up. So I'm going to remove the ease-of-use label. Feel free to add it back if you believe that it does make getting started with Data Prepper easier.

moltar commented 1 year ago

We would like to use JSON instead of YAML, simply because that is the default serialization format for JS-land. We use CDK for infra, which uses JSON, and providing a JSON config would be a piece of cake. But for YAML, need to install additional dips.

However, said that: JSON is valid YAML

Jackson can easily convert JSON to YAML internally

Does that mean that JSON is already supported, just not documented?

miekassu commented 1 year ago

We tested this by providing JSON formatted pipeline definition to the instance named pipelines.yaml, which worked well.

dlvenable commented 1 year ago

@miekassu , Thanks for clarifying that. This makes sense since we use Jackson to deserialize the pipeline configuration file.

Right now, Data Prepper scans the pipelines/ directory for .yaml files. I think we could update this to include scanning .json files as well. We'd probably want an integration test as well to ensure that it works.

Feel free to contribute a PR to that end to improve the experience!

dlvenable commented 9 months ago

@moltar ,

Does that mean that JSON is already supported, just not documented?

I believe that JSON is supported because underneath Data Prepper is using Jackson to parse the YAML. I think for this to be completed, we'd need these things:

  1. Add tests to ensure the behavior doesn't break.
  2. Look for .json files when scanning the pipelines/ directory.
  3. Avoid combining pipeline files and scan each file independently. This would help with YAML, and is necessary for JSON.
  4. Update the documentation for what a valid JSON looks like.

If you are interested in contributing, we are happy to accept a PR and help provide any guidance on the process.