Provide a hook or feature to sanitize some displayed content

warsaw commented 1 year ago

What's the problem this feature will solve?

I'm writing some fixtures and integration tests using live, authenticated connections to a test instance of a service. My test infrastructure can take care not to commit authentication secrets to the repository by putting them in a configuration file outside the repo. However, some of the tests or fixtures take this connection information (say, in a dict with keys like username and password) as an argument. If there is an error in a fixture or a failure of a test, pytest will print the value of these arguments. For local testing, this isn't too bad, since these credential secrets just get printed to my console. But if I run the test suite in my CI, then this output can leak credential information into pipeline logs, posing a security risk.

I'd like a way to be able to sanitize such output so username/password (or other secrets) don't get printed.

Describe the solution you'd like

I'm not sure what a solution would look like. I did try to search both pytest's documentation and pytest plugins on PyPI, plus some general internet searches (and maybe even a ChatGPT query or two) to try to find an existing solution. None of them worked or were simple to set up.

Off the top of my head I'm thinking something like a pytest decorator that I could place on some fixture functions and/or test methods that would signal that the arguments -- or possibly even local variables -- have sensitive information. This decorator could specify a user-supplied sanitation function that would get called on error or failure. This sanitation function would get some mutable structure of all the variables that pytest would normally print to the console, and this function would have the opportunity to change the values before they get printed. (Of course, pytest would have to be careful not to print this structure if the sanitation function itself has a bug!).

So in my situation, it might get access that credentials dictionary as an argument or local variable. My sanitation function could then look for username and password keys, and replace the value with '**suppressed**' or some such. Then pytest would print that sanitized variable without leaking the credentials.

Alternative Solutions

I've tried to write a custom plugin but have not had any luck in doing this. Part of the problem I think is that there is no public, supported API for getting access to the list of variables that will be printed, and no good way of interposing just before these variables are printed.

Additional context

I'm rather surprised this hasn't come up before, but if it has and I have missed the solution, then TIA for any guidance you can provide!

bluetech commented 1 year ago

Hi @warsaw,

Is it possible for you to solve this issue more generally? In one of my projects which handles sensitive data, I wrap all sensitive values in a Secret wrapper, which looks like this (and I'm sure could be better):

from __future__ import annotations
from typing import Generic, TypeVar, cast
from collections.abc import Callable

T = TypeVar('T')
S = TypeVar('S')

class Secret(Generic[T]):
    """A simple wrapper around a secret value, making sure the
    value does not leak into logs or exception messages.

    Wrap sensitive values like PANs in this.

    When you need to access the real value, use `get_secret_value()`.
    Prefer repeated calls to `get_secret_value()` over storing its result
    in a local variable.

    NOTE: Equality is strict - Secret(v) does not equal v.
    """
    __slots__ = ('_secret_value',)

    def __init__(self, value: T) -> None:
        self._secret_value = value

    def get_secret_value(self) -> T:
        return self._secret_value

    def map(self, f: Callable[[T], S]) -> Secret[S]:
        return Secret(f(self._secret_value))

    def redacted(self) -> T:
        """Get a redacted version of the value.

        Currently works for the following Ts: str, bytes
        """
        if isinstance(self._secret_value, str):
            return cast(T, '<redacted>')
        if isinstance(self._secret_value, bytes):
            return cast(T, b'<redacted>')
        T_name = type(self._secret_value).__name__
        raise TypeError(f'Secret[T].redacted is not supported T={T_name}')

    def __eq__(self, other: object) -> bool:
        if not isinstance(other, Secret):
            return NotImplemented
        return self._secret_value == other._secret_value

    def __hash__(self) -> int:
        return hash(self._secret_value)

    def __repr__(self) -> str:
        return f'Secret({self})'

    def __bool__(self) -> bool:
        return bool(self._secret_value)

    def __str__(self) -> str:
        return '<redacted>'

This works very well, hides the value from logs, stack traces, etc. However it does fall down when interacting with 3rd-party code (like http clients and such) which require the unwrapped value.

warsaw commented 1 year ago

I thought about that and experimented with it a bit, but it isn't feasible since we don't control all the code paths where the secrets are read from config files or pass to other components.

sfc-gh-yixie commented 2 months ago

@bluetech I have the same problem. I think logging all the function parameters automatically is a security issue of pytest. I understand it's convenient, but risky. Often times people don't realize it until they leak something. We may also use 3rd-party libraries that raise an error and get the secrets printed out.

pytest-dev / pytest