Closed mastersilv3r closed 6 months ago
I think the argument of developers making mistakes is not worth protecting against, because a phone number (or any known other PII) can appear inside any string block. In other terms, filtering on keys is not the correct way to solve this requirement.
Adding it to fast-redact
would be very hard, but I would welcome such a PR. However, it should not slow down them current benchmarks.
My recommendation for this feature is to implement this as a transport
so that it's outside of the hot path and there are reduced performance considerations (fast-redact predates transports).
I would recommend:
fast-redact
Closing due to lack of response.
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
The Problem
Given stringent statutory compliance, teams need to ensure that sensitive data is never leaked including via logs. Now consider a scenario where a developer may need to redact all values where any of the key or the value may match a regular expression.
Note: this is different from the nested keys at path feature currently supported by pino-redact
I believe this is not supported in Pino or fast-redact, because I could not find any documentation for the same. This could mean the following case for example
Example 1: by regex on key
Sometimes you may want to apply regex over a key to apply redact. This could mean the key is
phone_number
ormy_phone_number
oryour_phone_number
or any word which ends with stringphone_number
. To implement this redact (by key) instruction can continue to remain same as today except one minor difference - now an individual key can be be matched via regexes. For example:Example 2: by regex on regex on key and/or value
Why is this feature potentially important? Because of strict compliances every app developer needs to ensure that they do not leak sensitive data even by mistake. The current Pino redact implementation only works when developer has intentionally redacted certain keys. But developer can do mistakes. Or a developer may be working with a dynamic set of incoming stream of data not in its control. There the developer would not like to take chance. Given their unique business context they may have a fair idea that which kind of value patterns may hold sensitive information: for example phone number patterns, email patterns, some key name patters etc.
Paths for a developer
I understand there is a performance implication but assuming this is a must feature for a modern app with compliance checks, and that developer does not want to take chance even by mistake to let certain regex matching data to pass through to logs in their telemetry backends hosted on cloud or where ever, what choices do we have to support the developer's journey?
Solution 1: Redact using Pino itself (via fast-redact)
Enhance fast-redact itself and do via that?
Pros: Simplest path for Pino users :-) One places to configure everything about log redaction. One single source of truth on what to redact what not to. And that lies at the source of the log signal emission itself.
Cons: I don't see any yet
Solution 2: Redact outside Pino but within Nodejs runtime
There are two choices thought both the choices may reuse fast-redact or another utility which provides regex based redact.
Pro: Reusability: Can use this utility in any Nodejs project, for any purpose not limited to logging. Con: Would mean maintaining another dependency aside from Pino for logging purposes.
Then this lies outside of purview of fast-redact I feel. You may know better.
Solution 3: Solve it completely outside Nodejs runtime
Express this transformation (redaction) logic in a log exporter like Fluentbit or FluentD daemon-set. They have good support for log transformation and regex. This approach doesn't require changes in Pino or the application layer.
Pros:
Cons:
Conclusion
I would like to hear from the community and the maintainers