Allow users to preserve document IDs between source and target with live capture

Is your feature request related to a problem?

There is a limitation on live capture (using capture and replay) where if a user does not specify an ID when publishing to the source, the ID when publishing to the target will differ. In some cases, this may not matter, but this is problematic if you are expecting these IDs to be the same within your application code that leverages Elasticsearch/OpenSearch. An example where this can be problematic is with updates to records. When updating a document, you typically supply the document ID to target a specific record.

What solution would you like?

The ideal solution would involve implementing logic within capture and replay that ensures ID consistency between the source and target clusters. This could be achieved by either capturing and reusing auto-generated IDs or providing a mechanism to handle ID assignments explicitly. The solution should allow for ID preservation even in cases where the ID is not provided initially by the user. Specifically, response data containing the document ID that was generated on the source can then be re-used by the replayer to submit a request to the target using the same ID from the source.

What alternatives have you considered?

Generating and storing a unique mapping of source-to-target IDs during the capture phase to allow for ID consistency during the replay phase. This approach, however, could increase storage and lookup complexity.
Using custom logic in the application layer to track and align IDs between the source and target clusters, but this introduces extra development effort and may not be feasible in all use cases.

Do you have any additional context?

In use cases where ID consistency is crucial (e.g., for updates or deletions), mismatched IDs between source and target can cause unintended results or errors within application logic. This enhancement would ensure that the capture and replay process maintains ID consistency, reducing risks and enabling a more reliable migration experience. Additionally, applications depending on this feature would experience fewer edge-case issues during migrations, allowing them to align source and target records seamlessly.

opensearch-project / opensearch-migrations