open-telemetry / opentelemetry-specification

Specifications for OpenTelemetry
https://opentelemetry.io
Apache License 2.0
3.64k stars 870 forks source link

Clarify the mapping between Google Cloud Logging and OpenTelemetry #3774

Open alexvanboxel opened 7 months ago

alexvanboxel commented 7 months ago

What are you trying to achieve?

With this draft collector PR the PubsubReceiver will be able to handle the native translation of Google Cloud LogEntry (this is a feature provided by Google Cloud Logging: see View logs routed to Pub/Sub). We need more details on all the mapping from the LogEntry to the OpenTelemetry fields and attributes.

Additional context.

See this issue on the collector: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/23184

alexvanboxel commented 7 months ago

I've opened a draft PR https://github.com/open-telemetry/opentelemetry-specification/pull/3775 on the proposed clarifications. It would be great to have some involvement for Google contributors.

jack-berg commented 7 months ago

This makes sense, but the document modified in #3775 isn't normative. Perhaps this type of language belongs in a new document in the compatibility directory. Additionally, the new attributes defined should probably be defined and added to the semantic-conventions.

jsuereth commented 7 months ago

Hey @alexvanboxel,

I really like your suggested PR in that it adds the read side of the OLTP<-> Google Cloud Logging Mapping.

We've been having a lot of internal conversations around this mapping and have a (similar) proposed spec. @jack-berg and I discussed where this documentation might belong (see his comment above). I think we have a few options:

  1. We can create a new compatibility section that outlines this mapping.
  2. We could put this document in the collector near the receiver to denote what it does to GCP data.

Which would you prefer?

alexvanboxel commented 7 months ago

I think a new compatibility section is a great idea. Does this mean that all the sections in the datamodel-appendix would move to the compatibility section?

alexvanboxel commented 7 months ago

Additionally, the new attributes defined should probably be defined and added to the semantic-conventions.

I don't think I've added anything new. I've used the semcon that was available (except the gcp.*). I want to go further and define the mapping for the HTTP structure, but although HTTP is described as stable, it is not clearly defined for logs yet.

alexvanboxel commented 6 months ago

@jsuereth I've moved it in the branch to the compatibility section

alexvanboxel commented 6 months ago

@jack-berg and @jsuereth doesn't it make sense to have a compatibility section in the logs directory itself and that all model-appendix move (one by one) to that directory?

kamalmarhubi commented 5 months ago

It was silly of me to comment on the already-closed PR, so pasting my comments here.

on sourceLocation:

looking at docs for LogEntrySourceLocation I this ought to be mapped to the equivalent semconvcode.* attributes as follows:

  • sourceLocation.file => code.filepath
  • sourceLocation.line => code.lineno

Less sure of what to do with sourceLocation.function, which is described as

Human-readable name of the function or method being invoked, with optional context such as the class or package name. This information may be used in contexts such as the logs viewer, where a file and line number are less meaningful. The format can vary by language. For example: qual.if.ied.Class.method (Java), dir/package.func (Go), function (Python).

it ought to go into the code.namespace and code.function attributes. if you happen to have a sample of data across different languages for which gcp has a logging integration, perhaps the right thing to do can be figured out based on that? failing that, a reasonable best effort might be (python style split / indexing):

  • code.function gets sourceLocation.function.split(".")[-1]
  • code.namespace gets ".".join(sourceLocation.function.split(".")[:-1])



on httpRequest:

could be worth attempting to map these to http semconv? not sure why I didn't do it initially.

  • requestMethod => http.request.method
  • requestUrl => url.full
  • requestSize => see [0] below
  • status => http.response.status_code
  • responseSize => see [0] below
  • userAgent => user_agent.original
  • remoteIp => client.address
  • serverIp => server.address
  • referer => http.request.header.referer[0]
  • protocol => net.protocol.name (lower case), net.protocol.version

The remaining ones can be mapped to gcp.*:

  • latency
  • cacheLookup
  • cacheHit
  • cacheValidatedWithOriginServer
  • cacheFillBytes

[0] I had previously opened https://github.com/open-telemetry/semantic-conventions/pull/84 to add "whole-request" size attributes, ie headers + body. I will rebase / update the PR this week to see if it gets any traction.

austinlparker commented 1 month ago

@open-telemetry/collector-contrib-maintainer can this get moved to collector-contrib repo?

alexvanboxel commented 1 month ago

I can add the mapping in the collector README, but I feel the documentation should be either in spec/compatibility or either semconv/compatibiltity.

mx-psi commented 1 month ago

I guess we can do the same we did on https://github.com/open-telemetry/opentelemetry-specification/pull/3964#pullrequestreview-1960788192. I still think collector-contrib is not the right place for this long term, but for now it seems reasonable to me

alexvanboxel commented 1 month ago

I'm actively working on improving the stability of the gcp pubsub log -> otlp log; I will include the documentation in the same commit. But a side note: It only makes sense to add the conventions I use in the receiver; I hope Google (or any other provider) will document their mapping.