Data Prepper has an upcoming directory structure change (#305) which is conducive to having multiple jar files. Along with this, the data-prepper-core project can be split into multiple Jar files. These jar files can even be deployed to Maven Central.
Also, many of the projects, have the data-prepper- prefix. We may want to consider renaming the artifact and project names.
These are components which are used by both Data Prepper Core Components and Plugins. These projects are located in the common/ directory.
data-prepper-api - The existing project as-is. It will retain the Maven groupId org.opensearch.dataprepper.
data-prepper-configuration-converter-api - This project can have interfaces from data-prepper-logstash-configuration so that they are available to plugins. It will be in the Maven groupId: org.opensearch.dataprepper.converter
Core Components
Most components here have the Maven groupId: org.opensearch.dataprepper.core. The are located in the core/ directory in the project structure.
data-prepper-expression - The existing project as-is
data-prepper-configuration-converter - The existing project mostly as-is, though interfaces would be moved to data-prepper-configuration-converter-api. This project will have a different Maven groupId than other projects here : org.opensearch.dataprepper.converter.
data-prepper-pipeline - Code responsible for validating and parsing pipeline configurations
data-prepper-server - Manages the server which runs at 4900 by default (list pipelines, metrics, etc.)
data-prepper-plugin-framework - The framework for loading plugins
data-prepper-core - Contains everything needed to run Data Prepper, though without a Java main method.
data-prepper-main - The Java main method is here, and that is all. It uses data-prepper. By splitting this, it will be possible to run Data Prepper with a different main method, or even to run it programmatically.
There may be more ways to break down the core project. This can happen iteratively as well. But, by having this structure, the project will be in a better position to split them.
Plugins
This will be the same directory currently named data-prepper-plugins, but renamed to just plugins. The projects will not deploy to Maven Central.
Test
Any projects related to common testing libraries. There is only one library now, but we may find value in adding more to make integration testing easier. The Maven groupId is org.opensearch.dataprepper.test.
Other Projects
This proposal currently does not modify: release, e2e-test, performance-test.
Out of Scope
This restructure does not include any work toward having bundled/core plugins versus optional plugins. Thus, the plugins project is mostly left as-is.
This does not include the location of the scripts to start Data Prepper in the new directory structure. This issue is focused mainly on the Gradle project structure.
Questions
~Is there a better name for the core/data-prepper project?~
Should the Maven artifact names (and project names) start with data-prepper-? This is somewhat redundant, but many projects use this convention, so we may wish to continue to follow it.
Do we want to do anything with the e2e-tests and performance-tests directories?
What about data-prepper-benchmarks? Do we even need this still?
Can we move the shared-configs to a better location?
This change is not a breaking change for users downloading Data Prepper, so it doesn't have to come in 2.0. I'm pushing this out to after our 2.0 release.
Motivation
Data Prepper has an upcoming directory structure change (#305) which is conducive to having multiple jar files. Along with this, the data-prepper-core project can be split into multiple Jar files. These jar files can even be deployed to Maven Central.
Also, many of the projects, have the
data-prepper-
prefix. We may want to consider renaming the artifact and project names.Current Structure
Proposed Solution
Re-organize the project structure along the lines of the following:
Common Components
These are components which are used by both Data Prepper Core Components and Plugins. These projects are located in the
common/
directory.data-prepper-api
- The existing project as-is. It will retain the Maven groupIdorg.opensearch.dataprepper
.data-prepper-configuration-converter-api
- This project can have interfaces fromdata-prepper-logstash-configuration
so that they are available to plugins. It will be in the Maven groupId:org.opensearch.dataprepper.converter
Core Components
Most components here have the Maven groupId:
org.opensearch.dataprepper.core
. The are located in thecore/
directory in the project structure.data-prepper-expression
- The existing project as-isdata-prepper-configuration-converter
- The existing project mostly as-is, though interfaces would be moved todata-prepper-configuration-converter-api
. This project will have a different Maven groupId than other projects here :org.opensearch.dataprepper.converter
.data-prepper-pipeline
- Code responsible for validating and parsing pipeline configurationsdata-prepper-server
- Manages the server which runs at 4900 by default (list pipelines, metrics, etc.)data-prepper-plugin-framework
- The framework for loading pluginsdata-prepper-core
- Contains everything needed to run Data Prepper, though without a Java main method.data-prepper-main
- The Javamain
method is here, and that is all. It usesdata-prepper
. By splitting this, it will be possible to run Data Prepper with a differentmain
method, or even to run it programmatically.There may be more ways to break down the core project. This can happen iteratively as well. But, by having this structure, the project will be in a better position to split them.
Plugins
This will be the same directory currently named
data-prepper-plugins
, but renamed to justplugins
. The projects will not deploy to Maven Central.Test
Any projects related to common testing libraries. There is only one library now, but we may find value in adding more to make integration testing easier. The Maven groupId is
org.opensearch.dataprepper.test
.Other Projects
This proposal currently does not modify:
release
,e2e-test
,performance-test
.Out of Scope
This restructure does not include any work toward having bundled/core plugins versus optional plugins. Thus, the plugins project is mostly left as-is.
This does not include the location of the scripts to start Data Prepper in the new directory structure. This issue is focused mainly on the Gradle project structure.
Questions
core/data-prepper
project?~data-prepper-
? This is somewhat redundant, but many projects use this convention, so we may wish to continue to follow it.e2e-tests
andperformance-tests
directories?data-prepper-benchmarks
? Do we even need this still?shared-configs
to a better location?Tasks