winstonjs / winston

A logger for just about everything.
http://github.com/winstonjs/winston
MIT License
22.67k stars 1.8k forks source link

[Feature Request]: obfuscation of values that is PII or PCI data #2116

Open 32bit opened 2 years ago

32bit commented 2 years ago

๐Ÿ”Ž Search Terms

obfuscating log data for PII or PCI

The vision

Provide a feature to obfuscate or replace the PII/PCI data so it won't get logged.

Use case

To avoid writing custom code to maintain fields to parse out, would it be possible to add "filter" to remove or obfuscate the fields in a json perhaps.

Additional information

NA

wbt commented 2 years ago

How do you identify what data is PII/PCI? I would suggest just not passing that to your logging functions, as what constitutes PII/PCI is pretty application-specific. The custom formatting extensibility might also be useful to tap into as a good place to put a function which scans for patterns you consider PII/PCI and replaces it with whatever else you want.

escodel commented 11 months ago

Came across this feature request and had a working idea for a solution. Custom formatting makes sense, but almost like a plugin for this type of data with some options, perhaps?

If this issue is still relevant, and contributions are welcome, I could try to integrate the proof-of-concept for a PR

DABH commented 11 months ago

Contributions are always welcome! The question would be whether something belongs in Winston itself or e.g. as a separate transport. We have a lot of transports in the ecosystem but there isnโ€™t really a standard for or plugin repository of formatters. Perhaps something like this could live under examples? Or we should have an examples-like folder of useful formatters people have written? Open to ideas on how weโ€™d best capture that kind of community knowledge somewhere people could find it.

escodel commented 11 months ago

Makes sense, thanks! I could at least provide the example formatting and it could be grouped with other useful formats.

Thinking about it from the API payload/response scenario, it could be the approach of adding a flag similar to private: true but for masking, passing options with it.

I'll take a closer look at the transports, and to see how to integrate it as part of a formatting example

wbt commented 10 months ago

The example PR above is merged. I would still be supportive of a PR that turns on, at first optionally and by default in the next breaking-change version, some reasonable secret obfuscator, ideally drawing on something widely used elsewhere (by GitHub itself?) instead of reinventing something to be maintained separately.

escodel commented 10 months ago

@wbt right on, that would be super useful. I'll look at some examples and take a shot at a feature

escodel commented 10 months ago

The example PR above is merged. I would still be supportive of a PR that turns on, at first optionally and by default in the next breaking-change version, some reasonable secret obfuscator, ideally drawing on something widely used elsewhere (by GitHub itself?) instead of reinventing something to be maintained separately.

So I've been thinking about this feature but want to stay on the right track.

I'm looking at the problem of doing this with near-instant logging, whereas it seems the secret detection services (such as GitHub's) are more git repo/history scanners for hard-coded tokens and doing pre-commit hooks, sending matches for verification against an api, etc.

My thought process is to draw upon common regex patterns internally for performance, and maybe target common field names in dynamic input like in the formatted example from the PR.

If I follow what you're saying, should the source for those patterns be an existing outside library/service? I'm also trying to capture tweaking config options for including/excluding certain patterns.

I might not be seeing the full picture yet so any guidance would be awesome. Thanks!

escodel commented 10 months ago

Bumping for @wbt and any maintainers for input on the above thanks ๐Ÿ™

wbt commented 9 months ago

It seems like a reasonable approach. My main point is that we should try to avoid reinventing the wheel (and having to maintain the reinvention) to whatever extent things have already been done.