open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.88k stars 2.26k forks source link

[pkg/ottl] Support community ID network flow #34062

Closed mashhurs closed 3 weeks ago

mashhurs commented 1 month ago

Component(s)

pkg/ottl

Is your feature request related to a problem? Please describe.

What is a community ID and why do we need it?

The feature is widely used and here some reference applications:

Describe the solution you'd like

Introduce a converter which calculates the community ID based on the specification.

Describe alternatives you've considered

This requires a discussion of either

Additional context

No response

github-actions[bot] commented 1 month ago

Pinging code owners:

evan-bradley commented 1 month ago

Thanks for the detailed description @mashhurs. Can you comment on any usage outside Elastic? I'm not familiar with community ID, and while I see a handful of implementations on the specification repository you linked, it's not clear to me whether this function would be useful to a significant portion of OTTL users.

I'd also welcome input from others in the community if they are using community IDs and would like to see this function added to OTTL.

evan-bradley commented 1 month ago

Based on the graphic you provided for computing a community ID, it also looks like OTTL could create these IDs if it had a base64 encoding function. Would that be sufficient for you?

mashhurs commented 1 month ago

Thank you @evan-bradley for feedback.

Can you comment on any usage outside Elastic? I'm not familiar with community ID, and while I see a handful of implementations on the specification repository you linked, it's not clear to me whether this function would be useful to a significant portion of OTTL users.

Community ID is broadly used in networking solutions/services, especially in SIEM. Outside of Elastic, there are number of vendors/solutions applied community-id, some references:

By creating an OTTL community-id, we could help downstream services to correlate their datasets easily, avoiding multiple joins on tuples. AND, perform operations (creating alert, setup dashboards, etc...) on interest network flows.

Based on the graphic you provided for computing a community ID, it also looks like OTTL could create these IDs if it had a base64 encoding function. Would that be sufficient for you?

Community ID is (tuple: address, port and protocol) combination which generates an unique ID based on network address, port and protocol. Providing base64 encoding function would open a way to achieve the goal but it still requires a computation (I don't think single line config would make it).

Since community ID is a known concept in network analysis, I believe a OTTL function to generate community ID will provide lots of benefits in downstream systems.

evan-bradley commented 1 month ago

Thank you for the additional details.

Community ID is version:hash-value-of-tuple (tuple: address, port and protocol)

Where do the address, port, and protocol come from? Are they attached to the data, or is it expected they come from context set by receivers? If they come from context, do you think this functionality may make sense as a separate processor instead of as an OTTL function?

mashhurs commented 1 month ago

Thank you for the additional details.

Community ID is version:hash-value-of-tuple (tuple: address, port and protocol)

Where do the address, port, and protocol come from? Are they attached to the data, or is it expected they come from context set by receivers? If they come from context, do you think this functionality may make sense as a separate processor instead of as an OTTL function?

I wonder if OTTL function will be useful in other processors (such as filter with community-id, delete operation if interest network flow found, etc...)

ADD more thoughts: I am not super familiar with processors and its behaviors but with we are calculating community-id, not doing any actions on context like processors do (such as batch, memory_limit, etc...), and no need reject, retry mechanisms.

I will be curious about your opinion as well.

evan-bradley commented 1 month ago

ADD more thoughts: I am not super familiar with processors and its behaviors but with we are calculating community-id, not doing any actions on context like processors do (such as batch, memory_limit, etc...), and no need reject, retry mechanisms.

I mostly mean through similar mechanisms like how the k8sattributes processor works: for a given data payload, it looks at the attached connection metadata for which IP, port, etc. sent the payload to the Collector, then uses that to enrich the payload. It sounds like community ID functions in a similar way.

mashhurs commented 3 weeks ago

ADD more thoughts: I am not super familiar with processors and its behaviors but with we are calculating community-id, not doing any actions on context like processors do (such as batch, memory_limit, etc...), and no need reject, retry mechanisms.

I mostly mean through similar mechanisms like how the k8sattributes processor works: for a given data payload, it looks at the attached connection metadata for which IP, port, etc. sent the payload to the Collector, then uses that to enrich the payload. It sounds like community ID functions in a similar way.

Do you want me close this issue and open with new component? Or are you able to update (labels, required fields) this issue?

mashhurs commented 3 weeks ago

Proposed a communityid processor, closing this issue.