open-telemetry / semantic-conventions

Defines standards for generating consistent, accessible telemetry across a variety of domains
Apache License 2.0
256 stars 165 forks source link

Semantic conventions for database should be explicit about parameters / placeholder values #711

Closed remram44 closed 8 months ago

remram44 commented 3 years ago

What are you trying to achieve?

I would like to be able to see parameter values for SQL queries. I was about to raise an issue against the SQLAlchemy instrumentation, but noticed no specific guidance in the database conventions page.

The specs do mandate that the statement should be captured, but that statement may or may not be complete, depending on whether placeholders are used.

Additional context.

The full path (= with parameter values) is captured for HTTP requests, e.g. the flask instrumentation (for example) captures:

"http.route": "/pet/<int:helloid>"  # placeholders
"http.target": "/pet/123"  # parameterized

However if that HTTP requests triggers a SQL requests, it will show up as :

"db.statement": "SELECT * FROM pets WHERE id = %(param_1)s"

This seems inconsistent.

In addition, parameter values are exposed in systems that don't use SQL syntax, and are shown explicitly in the Redis example in the docs ("HMSET myhash field1 'Hello' field2 'World") which increases confusion.

There might be security considerations (see also https://github.com/open-telemetry/oteps/pull/100), but they are probably not fully mitigated as data not using placeholders will still appear in the trace (see also open-telemetry/opentelemetry-specification#1659).

I think in any case, there should be some guidance in the spec, even if it is the opposite recommendation to the one I would personally want. The current one does not mention placeholders at all, even though they seem commonly considered when making decisions in this bug tracker.

iNikem commented 3 years ago

The database semantic conventions specifically say about db.statement:

The value may be sanitized to exclude sensitive information.

remram44 commented 3 years ago

The information I want to get at is not sensitive, I will leave the questions of sensitive data to https://github.com/open-telemetry/oteps/pull/100. When investigating performance issues, I want to know which project took a long time to load, and I can't get that from the database spans (gotta correlate with logs which is not trivial).

trask commented 8 months ago

Closing, as we have another issue tracking the same now: #716

remram44 commented 8 months ago

I'm not holding my breath. It would be nice if I could get the basic support required so I can use open-telemetry for a simple CRUD app.

trask commented 8 months ago

hi @remram44! we've just kicked off a database semantic convention stability effort: https://github.com/open-telemetry/community/blob/main/projects/database-client-semconv.md

I can't promise whether this specific issue will be included in the initial stability, but we will definitely be considering it.