How to redact sensitive information from logs?

ArnavBalyan commented 3 months ago

Currently glog supports custom sinks, where users can implement custom logic. However, for a usecase like redacting information, or pre-processing log line, one needs to disable the default logging behaviour and enforce the custom sink. The way I see it, it's hard to do since there is no simple way to disable the default logging. Adding a custom sink would introduce duplicate logs, and will not prevent the unfiltered logs from appearing. Most modern logging libraries provide some way to redact sensitive information from logs, since it's a common usecase. What's the simplest way to do this with glog. Can something like this be supported? Where logs can be intercepted, edited, and then forwarded to the intended destination?

I think similar issue was discussed here https://github.com/google/glog/issues/470

ArnavBalyan commented 3 months ago

cc @sergiud

sergiud commented 3 months ago

Sounds like an unnecessary overhead. Why can't you redact sensitive information before you pass it to glog?

ArnavBalyan commented 3 months ago

Thanks for your reply, the logs are usually scattered throughout hundreds/thousands of files. Every file is doing something potentially like LOG(INFO) << password=123 . It's very hard to enforce this from caller side.

This would also be very powerful for usecases like enriching logs with additional information, defining custom formats for log lines etc. Since user has control over preprocessing the log. Thanks!

sergiud commented 3 months ago

It's very hard to enforce this from caller side.

What does that mean? If it is hard to transform the data on the call site how would the proposal simply the problem if you need to deal with the output in a generic way?

This would also be very powerful for usecases like enriching logs with additional information, defining custom formats for log lines etc. Since user has control over preprocessing the log. Thanks!

The library provides log prefix format customization. Thus the output is already fully customizable.

ArnavBalyan commented 3 months ago

Thanks for your reply For #1 instead of changing the log line before emitting it or introducing new macros, we can add user defined pluggable transformations in glog. Before emitting the logline, it can get transformed from a user defined function. This change inside the glog would be a central change where users can opt to make any change to the log message.

#2 IIUC, log prefix format customization only customizes the prefix, but not the actual log message. It seems we can change the supplemental data (timestamp, filename etc), but not the main log message. If this can be extended to the log message, that would also work.

Or if there is any way to achieve this using existing glog functionality please let me know. Based on my reading we can't customize the actual log message, and additional sinks would not help. Thanks!

sergiud commented 3 months ago

we can add user defined pluggable transformations in glog.

I already got that part. However, what I do not get why would that be easier than transforming data at the call site. You will need to deal with log output that can be formatted in any way. Handling arbitrary strings is evidently not easier that modifying values before you pass them to glog.

As for 2, clearly you can format the log line the way you want. There is not central customization in glog necessary. If you need to provide a consistent log line format, just write your own format function instead and use it consistently.

ArnavBalyan commented 3 months ago

For #1 I think there is some gap here, I mean changing the log lines before sending to glog would require changing the whole codebase and transform every string that gets logged currently. This is not possible in open source codebases that rely on glog having thousands of log lines. It's better if we can have it in glog which can transform internally before streaming to destination.

For #2, I see thanks, just to be clear, we can change the log message itself using custom prefix? Is there any example for it, based on the documentation, it seems we don't get access to the log message, but other metadata like the timestamp/filename etc, and the user defined customization will be simply added as a prefix to the actual message. It does not seem like we can change the main log line? Could you please share any documentation/example for changing the log message using prefix customization, it would be great if this can be done using existing functionality.

Thanks!

sergiud commented 3 months ago

For #1 I think there is some gap here, I mean changing the log lines before sending to glog would require changing the whole codebase and transform every string that gets logged currently.

I would argue refactoring the code is the most reasonable and computationally efficient approach here and therefore a pretty bad argument for the proposed feature.

It's better if we can have it in glog which can transform internally before streaming to destination.

Again, this does not address my question in any way. You are just stating an opinion without providing a meaningful reasoning behind it.

It does not seem like we can change the main log line

Correct. As already mentioned, this is also completely unnecessary because the log line is generated by the user not the library. The user has full control of the log message and can modify using his own logic. Obviously, this is different to a log prefix which is generated by glog and therefore needs a hook for customization.

ArnavBalyan commented 3 months ago

I would argue refactoring the code is the most reasonable and computationally efficient approach here and therefore a pretty bad argument for the proposed feature.

Sure, this is not feasible for open source codebases that have already hundreds/thousands of log lines, I was just proposing the user can choose the pros/cons, but atleast the flexibility could be offered. The default behaviour would remain the same, just that an additional feature could be offered to users (similar to custom sinks). It's upto the library owners to decide whether to support it.

I understand if we prefer users changing thousands of log lines over providing an optional add-on. Thanks for your support!

sergiud commented 3 months ago

Overall there are actually multiple issues here:

I argue that the use case is very uncommon but the implementation will potentially have a negative impact on the performance even if you do not use the filtering.
The proposal encourages bad software engineering practices ("I don't want to improve my code and instead let a third party implement a feature that causes more problems that one would expect at first")
Potentially enormous security risks due to unchecked logging of sensitive (user) information. The masking should be done before you log the corresponding data at the call site.

Note that the overall goal of glog is keep the logging implementation simple. Given the user can write a layer on top of glog to control the output a central filtering mechanism in glog is unnecessary. I also see arguments such as "our code base is too large" as bad excuses for not improving own code.

sergiud commented 3 months ago

I'm closing the issue since this proposal is out of scope.

google / glog

How to redact sensitive information from logs? #1121