opensearch-project / data-prepper

OpenSearch Data Prepper is a component of the OpenSearch project that accepts, filters, transforms, enriches, and routes data at scale.
https://opensearch.org/docs/latest/clients/data-prepper/index/
Apache License 2.0
266 stars 206 forks source link

Add a fallback index in the opensearch index config for dynamic indexes #2234

Open kkondaka opened 1 year ago

kkondaka commented 1 year ago

Is your feature request related to a problem? Please describe. Opensearch Sink allows dynamic index names in the form of index: test-${propa}-${propb} and values for propa and propb are extracted from the event. If the event does not have any of the keys that are part of the index name, then the event is dropped. This is not a very desirable thing.

Describe the solution you'd like A clear and concise description of what you want to happen. This feature request proposes adding a new opensearch sink config option called fallback_index, to be used only if the primary index is a dynamic index, to store the events for which a dynamic index could not be created.

sink:
    - opensearch:
        index: test-${propa}-${propb}
        fallback_index: test-fallback-index

Describe alternatives you've considered (Optional) Alternative is to use null string when a field is missing in the event. In the example above, if the dynamic index is test-${propa}-${propb} and the event is missing value for propa but has a value of xxx for propb, the index name would be test--xxx. But if propb is also missing, it would become test--. Also, if the dynamic index is ${propa}-${propb} and both fields are missing in the event, the index name would become -, and in absolute worst case, if there is no - in the dynamic index, it would become empty string for index name and it will be failed to be stored.

Additional context Add any other context or screenshots about the feature request here.

dlvenable commented 1 year ago

What if propa is present, but propb is not? All would end up in the same index. I'm actually not quite sure how often users will use two different properties in their index names. But, we are offering that as a feature now anyway.

Might it make more sense to allow default values in the expression syntax itself? Perhaps:

test-${propa:statica}-${propb:staticb}

Given an event: {"propa" : "turtles"}, the index name would be: test-turtles-staticb.

dlvenable commented 1 year ago

Or maybe this would be a better syntax: ?:

test-${propa?:statica}-${propb?:staticb}
kkondaka commented 1 year ago

Is "?:" mandatory or is it optional? If it is going to be mandatory we should do it before we release 2.1

kkondaka commented 1 year ago

Infact, it should be mandatory, otherwise we are back to the original problem of how to handle when a field is missing.

kkondaka commented 1 year ago

Another option is to use default as the string for the missing field. That way configuration change is not required.

dlvenable commented 1 year ago

Is "?:" mandatory or is it optional? If it is going to be mandatory we should do it before we release 2.1

I think this is optional. I don't think we need to require a default value, but we could encourage it.

Infact, it should be mandatory, otherwise we are back to the original problem of how to handle when a field is missing.

I think somebody could configure it if desired. Really, we probably need the final solution to be part of a more robust DLQ mechanism.

Another option is to use default as the string for the missing field. That way configuration change is not required.

I don't think we should impose any naming on the user here. I'd rather they have the option to choose it.

We could actually provide users both options: using default properties and having a fallback index.

kkondaka commented 1 year ago

think this is optional. I don't think we need to require a default value, but we could encourage it. -- what do you suggest we should do if it is not provided?

dlvenable commented 1 year ago

I suggest that we drop messages if the user does not specify the fallback index.

dlvenable commented 1 year ago

I do not see how we can predict what index they would want to use. Nor do we know if the user or role has permission to write to such an index. So we are quite likely to end up failing due to access controls.