opensearch-project / opensearch-migrations

Migrate, upgrade, compare, and replicate OpenSearch clusters with ease.
https://aws.amazon.com/solutions/implementations/migration-assistant-for-amazon-opensearch-service/
Apache License 2.0
39 stars 28 forks source link

Allow users to preserve document IDs between source and target with live capture #1087

Open sumobrian opened 1 month ago

sumobrian commented 1 month ago

Is your feature request related to a problem?

There is a limitation on live capture (using capture and replay) where if a user does not specify an ID when publishing to the source, the ID when publishing to the target will differ. In some cases, this may not matter, but this is problematic if you are expecting these IDs to be the same within your application code that leverages Elasticsearch/OpenSearch. An example where this can be problematic is with updates to records. When updating a document, you typically supply the document ID to target a specific record.

What solution would you like?

The ideal solution would involve implementing logic within capture and replay that ensures ID consistency between the source and target clusters. This could be achieved by either capturing and reusing auto-generated IDs or providing a mechanism to handle ID assignments explicitly. The solution should allow for ID preservation even in cases where the ID is not provided initially by the user. Specifically, response data containing the document ID that was generated on the source can then be re-used by the replayer to submit a request to the target using the same ID from the source.

What alternatives have you considered?

Do you have any additional context?

In use cases where ID consistency is crucial (e.g., for updates or deletions), mismatched IDs between source and target can cause unintended results or errors within application logic. This enhancement would ensure that the capture and replay process maintains ID consistency, reducing risks and enabling a more reliable migration experience. Additionally, applications depending on this feature would experience fewer edge-case issues during migrations, allowing them to align source and target records seamlessly.