Open bobheadxi opened 2 years ago
Alternatively, we now re-export everything via https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/lib/log/fields.go - we could simply, for example, replace log.String
with an implementation that does some redaction, and introduce log.PublicString
if we want an equivalent to the Swift logging example raised
we could simply, for example, replace log.String with an implementation that does some redaction, and introduce log.PublicString if we want an equivalent to the Swift logging example raised
From a long-term perspective, something like that would be nice-to-have IMO. From a short-term perspective, I'm a little concerned that such a big change would negatively affect incident response, because too many strings would be redacted.
Here's an alternate idea for a more staged rollout.
privacy
package)
Privacy
enum with 3 cases: Private = 0
, Unknown = 1
, and Public = 2
.type Text struct { data string; privacy Privacy }
. The fields are kept (heh) private to avoid direct modification from outside, you need to use helper functions. The data
field will be available through a helper function with "unchecked" or similar in its name indicating some level of danger.log.String(key, expr)
calls with log.Text(key, privacy.NewText(expr, privacy.Unknown))
. Initially, privacy.Unknown
text values would not be redacted.Tweak the log.String
API in a way that documents why it is not available, and prevents someone from "accidentally" introducing it in the future. Here's a strawman idea:
type UseFuncTextInstead { unused struct{} }
// String is a function that should not be implemented. It's presence would increase
// the risk of accidentally log user-private data, making it visible to a site-admin.
//
// Use Text instead.
func String(_ string, _ UseFuncTextInstead) {
panic("do not implement this!")
}
privacy.Unknown
in the codebase, and update those to either plumb the proper Privacy
value from context, or set the right value (Private
or Public
) (aside: it'd be fun to use a Code Insights dashboard to track this!). Once we know for sure that there are no privacy.Unknown
values (or very few), we can make it so that privacy.Unknown
text values are also redacted.Q: Why introduce a new type instead of only working with functions? A: Because using a type gives better guarantees. Eventually, we want to start using that type in more places, so that the same field of a struct is logged consistently (with or without redaction) in different places.
If you are on-board with this suggestion, I can try to come up with a rough API and maybe even submit a PR as time permits. Happy to discuss/brainstorm ideas over a call too. I'm out for the next 2 days though as I'll be traveling.
@bobheadxi, I don't think the "close" is quite right in:
(Although, maybe you didn't select "close", it just came up automatically?)
The linked PR (as it stands) won't fully resolve this issue because we still have 350+ uses of log.String
from opentracing-go/log
. I have some ideas on how we could address that; for example, we could create a shim analogous to how we shim zap
(we still need to construct values of type opentracing-go/log
.Field). But I don't want to tackle that in the same PR because that PR is already quite big.
It's just the only way to get a concrete "link" between issue and PR that is visible in the GitHub UI, if it doesn't fully resolve we can just re-open this issue after the fact to link multiple PRs :)
Update - @vrto working on collecting more specifics on the business need side of things to address questions raised in https://github.com/sourcegraph/sourcegraph/pull/36300 , e.g. https://github.com/sourcegraph/sourcegraph/pull/36300#issuecomment-1148378630 and https://github.com/sourcegraph/sourcegraph/pull/36300#issuecomment-1145806577
I'll raise something here I mentioned in the linked PR, I think encoding privacy somehow into the key name will give us the most bang for the buck. Simple for a program to do redaction on structured logs if done this way. Depending on the logger, it feels like it is something that should be possible.
Note that sensitive information shows up in errors as well, e.g. https://sourcegraph.slack.com/archives/C1JH2BEHZ/p1655314729793979 - a key-name-based approach would not provide protection against this, unless we choose to redact entire errors (which could be problematic for debugging). Protection against this is one nice property of the privacy.Text
approach in https://github.com/sourcegraph/sourcegraph/pull/36300 - we will need to address this as well
Raised by @varungandhi-src :
all assuming we go with Zap, to which there has been no objections so far