open-telemetry / opentelemetry-python-contrib

OpenTelemetry instrumentation for Python modules
https://opentelemetry.io
Apache License 2.0
741 stars 615 forks source link

Redis Sanitization #2175

Open nitsanh opened 9 months ago

nitsanh commented 9 months ago

Hello,

I'm using opentelemetry-instrumentation-redis in my system to collect traces from Redis. I would like the traces to include info about the key, as I use Redis in different ways throughout the system and I'd like to be able to differentiate between them when viewing my traces. I saw the work you made on sanitizing the db.statement by default here and here.

Is there a way that I can bypass this sanitization or provide my own sanitization function (so I would sanitize the value but not the key)? I didn't find any reference for that in the code or the docs.

Thanks!

mastizada commented 4 months ago

I also observed that sanitization is replacing the statement here:

out = [str(args[0])] + ["?"] * (args_length - 1)

So, something like GET key value becomes GET ? ?.

Is there a standard for adding a parameter that will allow us to keep first 2 args and sanitize the rest? Maybe we can provide sanitizer function like _format_command_args as argument?

I am happy to work on it

mastizada commented 3 months ago

I solved for now for my use case using request hook:

from opentelemetry.sdk.trace import Span
from opentelemetry.semconv.trace import SpanAttributes
from redis.connection import Connection

def sanitize_redis_statement(redis_args: tuple) -> str:
    """
    Based on opentelemetry.instrumentation.redis.utils._format_command_args.
    """
    cmd_max_len = 1000
    value_too_long_mark = "..."

    if not len(redis_args):
        return ""

    args = list(redis_args)
    three_key_list = [
        "HSET",
        "HSETNX",
        "JSON.MSET",
        "JSON.SET",
        "LSET",
        "PSETEX",
        "SETBIT",
        "SETRANGE",
    ]
    two_key_list = ["GETSET", "MSET", "MSETNX", "LPUSH", "LPUSHX", "RPUSH", "RPUSHX"]
    # change values with ? mark
    match args[0]:
        case "SET":
            if len(args) > 2:
                args[2] = "?"
        case value if value in two_key_list:
            if len(args) > 2:
                args[2:] = ["?"] * (len(args) - 2)
        case value if value in three_key_list:
            if len(args) > 3:
                args[3:] = ["?"] * (len(args) - 3)
    # join arguments together to form the query
    query = " ".join(str(element) for element in args)
    # truncate if it is longer than allowed length
    if len(query) > cmd_max_len:
        return query[: cmd_max_len - 3] + value_too_long_mark
    return query

def request_hook(span: Span, instance: Connection, args: tuple, kwargs: dict):
    """
    Custom name for redis and better query sanitizer.

    @param span: Active opentelemetry span
    @param instance: Redis connection instance
    @param args: Arguments for the execute command
    @param kwargs: Keyword arguments for the execute command
    """
    if span and span.is_recording():
        new_name = f"redis.{span.name}"
        span.update_name(new_name)

        span.set_attribute(SpanAttributes.DB_STATEMENT, sanitize_redis_statement(args))

And then when instrumenting redis:

RedisInstrumentor().instrument(request_hook=redis_request_hook)