Kafka instrumentation should record server attributes when possible

lmolkova commented 7 months ago

Is your feature request related to a problem? Please describe.

Existing Kafka instrumentation does not capture any server/network details.

So in case I have multiple kafka clusters in my application, I cannot differentiate between them. I also don't know which node operation was done against.

I think kafka instrumentation should record cluster-id as a server.address or node host if it can retrieve them. Will bring it up in messaging semconv to decide which one.

Describe the solution you'd like

Kafka instrumentation should do the best effort collecting server.* attributes and if it's not possible should collect some network.* attributes.

Publish/consume spans should (as they already are) be created on the public api surface and cover duration of logical operation (with all retries).

In case multiple server.addresses (or network.peer.addresses) are available (e.g. tried one node and fell back to a different one), we want to report only the last node contacted on the logical operation.

Network-level kafka instrumentation is not covered by messaging semconv and there is no guidance on how/if to instrument it at this point.

Describe alternatives you've considered

No response

Additional context

No response

laurit commented 7 months ago

I think kafka instrumentation should record cluster-id as a server.address or node host if it can retrieve them. Will bring it up in messaging semconv to decide which one.

The description of server.address in https://opentelemetry.io/docs/specs/semconv/attributes-registry/server/ reads Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. as far as I can tell cluster id does not fit that description.

Kafka instrumentation should do the best effort collecting server. attributes and if it's not possible should collect some network. attributes.

Unfortunately, at least to me, it seems that neither of these are easily collectible. For javaagent it is probably possible to get access to network peer with some extra instrumentation, but for library instrumentation this may very well be unattainable.

lmolkova commented 7 months ago

Thanks @laurit !

This is a great feedback to messaging SIG. I agree that it's not trivial. Possible approach could be to let users opt in into getting some info with describe cluster API call or DNS lookup (if we manage to get at least an IP).

huange7 commented 6 months ago

Hi, @lmolkova . How is the progress going?

trask commented 6 months ago

Hi @huange7! We'd welcome a PR for this issue if it's something you're interested in working on.

huange7 commented 6 months ago

Hi @huange7! We'd welcome a PR for this issue if it's something you're interested in working on.

Hi, @trask ! I am interested in this issue, and I will attempt to work on it. Please assign to me.

open-telemetry / opentelemetry-java-instrumentation