Open nemoshlag opened 1 year ago
I think that the default behavior should be to sanitize the information, and give an option to display the information if needed.
What do you think? for example: #1512
Should psycopg2
be included in this list? db.statement
already looks sanitized (maybe from the parameterized query?). For example:
{
"value": "INSERT INTO \"db_model\" (\"title\") VALUES (%s) RETURNING \"db_model\".\"id\"",
"key": "db.statement"
}
In my case I would like to see the raw db.statement instead of the sanitized one.
Since psycopg2 uses dbapi I think you can achieve raw db.statement by setting capture_parameters=True
I thought so to, but maybe it's not implemented? capture_parameters=true
wasn't working for me.
Yes, it needs to be added on psycopg2 side, similar to the way enable_commenter does. Here's how it looks from dbapi side.
Should I create a separate issue for this?
This is how I expected it to work in my code:
Psycopg2Instrumentor().instrument(capture_parameters=True)
Pls do, I'll add it to the aggregated list
Hi, I've got 2 questions:
db.statement
only if it's sanitized, does it make sense if we always sanitize and always set db.statement
?For example, let's say a database client uses psycopg2 connection cursor to make (a) a parameterized query and (b) a hard-coded query:
# (a)
statement_a = "SELECT * FROM city WHERE id = %s"
city_id = "1818"
cursor.execute(statement_a, [city_id])
# (b)
statement_b = "SELECT * FROM city WHERE id = 1818"
cursor.execute(statement_b)
For both queries, I would expect db.statement
to always be set and always sanitized so that it would be "SELECT * FROM city WHERE id = %s"
for both (a) and (b). Or maybe a different substitution like "SELECT * FROM city WHERE id = 0"
.
If the user has opted into retaining the original values, what would they be? Would it be an extraction from (b) so that both (a) and (b) would set "original" parameters ["1818"]
? Or would it be an interpolation for (a) so that both (a) and (b) set "original" statement "SELECT * FROM city WHERE id = 1818"
? Or something else?
Looking at some existing code: In Python DBAPI instrumentor it captures db.statement.parameters
as args
(in this example it'd be ["1818"]
and only for (a)). In the Python mongodb instrumentor we're capturing the full statement -- I don't know mongo very well so maybe there's a reason for this.
According to semantic-conventions, a DB instrumentation should have the option to sanitize sensitive data from either db.statement or db.operation attributes. Consider using instrumentation.dbapi to reduce instrumentation-specific logic & tests. This issue seems relevant for the following instrumentations:
13042023 Edit: Updating this issue with database specifications to add db.statement attribution only if it's sanitized. https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/database.md#call-level-attributes