opensearch-project / data-prepper

Data Prepper is a component of the OpenSearch project that accepts, filters, transforms, enriches, and routes data at scale.
https://opensearch.org/docs/latest/clients/data-prepper/index/
Apache License 2.0
262 stars 195 forks source link

Alternative metadata naming for version #3630

Open christopheranderson opened 11 months ago

christopheranderson commented 11 months ago

Is your feature request related to a problem? Please describe. There is a problem with the DynamoDB source metadata. We are currently using dynamodb_item_version as the key for the metadata field with nanosecond precision version of the record timestamp, equivalent of the dynamodb_timestamp field which currently is only second precision. This has a potential naming conflict if DynamoDB ever supports an item version concept.

There is a similar field, opensearch_action which is a remapping of the dynamodb_event_name field to the recommended action on OpenSearch. However, opensearch_version also collides with the version of the OpenSearch engine that customers might confuse it with. Just plain action could be confusing.

Describe the solution you'd like

Option 1: @<setting> convention (Recommended)

Use @<setting> convention for naming generated/remapped metadata fields. This would look like this for DynamoDB+OpenSearch.

action: "${getMetadata('@action')}"
document_version: "${getMetadata('@document_version')}"

This has the benefit of not needing to "read the docs" to understand the recommended defaults. The default is always the name of the setting with an @ prefix. The @ also always signals it is an artificial field (as in @rtificial).

However, this could set expectations we use @<setting> for every setting, such as @document_version_type, which is not currently the plan. I believe this is acceptable because, if we do get that feedback, it is probably fine to do that for most settings (e.g. @document_version_type would always just be external).

Describe alternatives you've considered (Optional)

Option 2: <setting> convention

Use <setting> convention for naming remapped metadata fields. This would like this for DynamoDB+OpenSearch.

action: "${getMetadata('action')}"
document_version: "${getMetadata('document_version')}"

This has the benefit of not needing to "read the docs" to understand the recommended defaults.

However, it can open the door to naming collisions where the convention couldn't be follow. For example, for a potential integration with DocumentDB, document_version may be a field we'd get from DocumentDB that needs to be remapped to be appropriate for OpenSearch's document_version. @document_version would still possibly have some confusion in this case, but the @ make it clear which one would be autogenerated vs from the source.

Option 3: Only rename dynamodb_item_version to document_version

Rename dynamodb_item_version to document_version for now. For DynamoDB+OpenSearch, this would look like:

action: "${getMetadata('opensearch_action')}"
document_version: "${getMetadata('document_version')}"

This solves the bare minimum requirement of avoiding name conflict with a possible future DynamoDB feature.

However, it is less clear which prefix to use when and whether the field is autogenerated or not.

Additional context

dlvenable commented 11 months ago

@graytaylor0 , Did you resolve this in the renaming done in #3634, or is there more work to do?

christopheranderson commented 11 months ago

I think #3634 implemented Option 3 to address the specific issue on the new plugin, but I'd advocate for keeping this issue open for resolving the general pattern.