opensearch-project / data-prepper

Data Prepper is a component of the OpenSearch project that accepts, filters, transforms, enriches, and routes data at scale.
https://opensearch.org/docs/latest/clients/data-prepper/index/
Apache License 2.0
238 stars 176 forks source link

[RFC] OpenSearch Ingestion: A New Name for the Next Steps of Data Prepper #4309

Open dlvenable opened 3 months ago

dlvenable commented 3 months ago

A lot has changed since Data Prepper was initially introduced to the public in December 2020.

First, when Data Prepper launched there was no OpenSearch project. Shortly after the release of Data Prepper, the OpenSearch project launched with OpenSearch Core and OpenSearch Dashboards. Data Prepper joined the project shortly after.

Second, Data Prepper has become a key component of the OpenSearch Toolbox. Along with OpenSearch Core and OpenSearch Dashboards, Data Prepper is the third part of the whole platform, and is the recommended way to ingest data into OpenSearch.

Third, Data Prepper itself has grown as a product. When Data Prepper first launched, it was focused on supporting trace analytics. It quickly grew to support log analytics from a variety of sources. Data Prepper has also grown to supporting search use-cases through sources such as S3 and DynamoDB. And with the addition of OpenSearch as a source, users can migrate data between OpenSearch clusters.

With these changes, Data Prepper should clearly be part of the OpenSearch ecosystem. To that end, I propose that we rename Data Prepper to OpenSearch Ingestion.

This name reflects two important aspects of this project. 1) Data Prepper is part of the OpenSearch Toolbox. The new name conveys this to users, clarifying that this is part of OpenSearch. 2) This name conveys the goal of Data Prepper to provide ingestion into OpenSearch.

One concern that may arise with this name is that may indicate that the product only sends data to OpenSearch. The maintainers add sinks that help complement the primary product use-case of ingesting data into OpenSearch. For example, writing to S3 to reduce the volume of data going to OpenSearch. These continue to be important offerings of the product. But, they are complementary to the primary goal of ingesting data into OpenSearch.

Process for Renaming

An important principle in the OpenSearch project is to support semver and avoid breaking changes. The maintainers of Data Prepper follow this principle and will continue to do so with this change. Renaming a product can be a disruptive change. But we will take care to follow semver and reduce friction. Here is a sketch of the process for renaming.

  1. We will make text changes to the repository, product pages, and documentation with the new name. This change just updates the names that readers see.
  2. Update the URLs for the product and documentation pages with the name opensearch-ingestion and add redirects from the existing data-prepper URLs.
  3. Data Prepper currently deploys artifacts for Docker images and archive files. For the remainder of the 2.x versions, we can retain the Data Prepper name to avoid breaking any automation. Additionally, we could add the new artifact names in parallel. This way, users can update their automation to use the new name at their convenience. When 3.0 releases, we would remove any artifacts named Data Prepper.
  4. Data Prepper itself has no APIs with the name Data Prepper. So the are no API changes needed.
  5. Code changes can come in over time as they don’t have as much impact on users. We would create a new root package - org.opensearch.ingestion. New plugins can start to use this package. We can migrate existing code to this package over time.
  6. Rename the project in GitHub to OpenSearch-Ingestion. GitHub supports renaming a project to support redirects on the URL.
travisbenedict commented 3 months ago

An additional concern that I see with the proposed name is that it could lead to confusion with the ingest pipelines that can be run on an OpenSearch node: https://opensearch.org/docs/latest/ingest-pipelines/.

Are there any plans for mitigating that confusion?

dlvenable commented 3 months ago

An additional concern that I see with the proposed name is that it could lead to confusion with the ingest pipelines that can be run on an OpenSearch node: https://opensearch.org/docs/latest/ingest-pipelines/.

Are there any plans for mitigating that confusion?

@travisbenedict , Thank you for noting this. I failed to mention this, but yes. First of all, I have already seen that users are confused on these. I have discussed the name "index pipelines" with a few colleagues. It helps convey that this is happening on the indexing side (OpenSearch). We'd need to make this proposal in the OpenSearch project. And it would require more of a migration because there are APIs that have this name in them.

Relatedly, another idea we've had some interest in is tighter integration with OpenSearch in general. Perhaps this would mean that OpenSearch itself has some endpoints to get information on Data Prepper pipelines. It would be nice to call those "ingestion" in the OpenSearch API and model.

oeyh commented 2 months ago

One area I like the name OpenSearch Data Prepper more is that it better conveys the extensive transformation, filtering and enrichment capabilities of the software. And we are working to broaden these functionalities even further. In that sense, "Data Prepper" communicates our goal beyond transferring data to OpenSearch.

On the other hand, I do like that OpenSearch Ingestion is better connected with the OpenSearch project and the managed service that uses it, especially for new users.

kkondaka commented 2 months ago

Is two name changes going to happen at the same time? I think it is better if they change the name "index pipelines" at least few months before our name change

epugh commented 1 month ago

"OpenSearch ETL"? I'm still fairly new to the OpenSearch ecosystem, and reading the RFC I was intrigued to learn that it started outside of OpenSearch.. My perception has been that it is the OpenSearch focused ETL tool, and not a general purpose ETL solution. I think you reach for Data Prepper if your core data platform is OpenSearch, not if your core platform is something else.