mozilla / chronicle

find everything you've ever found
http://mozillachronicle.tumblr.com/
Mozilla Public License 2.0
16 stars 6 forks source link

audit mozlog usage #87

Open jaredhirsch opened 9 years ago

jaredhirsch commented 9 years ago

from PR 86 review:

Some mozlog nits:

  • The first argument should be a String with no spaces. It will become part of the op that Heka filters by. You probably won't be logging the verbose lines, but this warn (and the others) will trip things up. You can set debug to true in mozlog config to add an assertion you're playing by the rules.
  • If you pass an Error as an argument, instead of concatenating, you'll get a stack trace. shrug

Let's use the Heka-style reporter and make sure the output works properly across the whole server.

jaredhirsch commented 9 years ago

more from that same PR: "the 2 strings will be concatenated together. So log.info('checkAuth') would create the op of chronicle.server.bell.profile.checkAuth."

jaredhirsch commented 9 years ago
jaredhirsch commented 9 years ago

ehh, bumping to sprint 5

jaredhirsch commented 9 years ago

Here's a sketch of a more coherent approach to logging. I think these notes are detailed enough that it's pretty clear which level to use when developing, and by using all the available levels, we can record that errors have occurred, but avoid paging ops for non-critical errors. We can just wire up the critical errors to the pagerduty queue and ship the rest of the errors to graphite for offline analysis ("oh look, that's a lot of weird errors from that part of the app since our last release...")

@mostlygeek @nchapman @pdehaan input welcome

log level conventions

Our log library offers 7 levels of log verbosity. Here's how we use them:

nchapman commented 9 years ago

Related: #275

jaredhirsch commented 9 years ago

if we opt for structured logs with object hashes, we can ensure that production levels wipe out password and userId and other PII to prevent user data being displayed . or we can stick with flat log messages and use log levels to protect user data in prod.

mostlygeek commented 9 years ago

Did you just copy/pasta the syslog levels? :)

So we have this: https://mana.mozilla.org/wiki/display/CLOUDSERVICES/Logging+Standard which is our currently sort of agreed upon logging standard. If you log output in this format we already have the heka transports available w/out any custom code.

Additionally, I like the idea of monitoring based on the number of messages generated at a specific log level! If you could stick these into statsd output we can set thresholds. It'd be a new experiment for us and it'll decouple metrics/logging from monitoring/alerting.

jaredhirsch commented 9 years ago

@mostlygeek Those are the log levels exposed by mozlog, which has heka integration built-in (that was my reason for choosing it). In these docs, I'm just trying to take advantage of the pre-defined granularity to partially solve our story around logging and partially come up with a story around error management (eg, bend vs break should imply there's a nearby log.critical vs log.warn).

I don't know if mozlog supports statsd format, I'll look into the simplest (1 hour or less) workaround.

jaredhirsch commented 9 years ago

One important consequence of not ever leaking user data is that we can't log the errors themselves at any level higher than verbose, since the error object might contain user data in it.

This means those log lines need to be quite liberally sprinkled through the code, since we will rarely see a stack trace in production.

jaredhirsch commented 9 years ago

whew, I really don't like the code I wound up with here. Too complicated for our current needs. The idea of never logging user info is cool in theory, but it's way better to anonymize users instead (as part of the logging / metrics pipeline), so that we can draw useful inferences. Example: how many users visit the site daily vs weekly.

I've officially fallen down the rabbit hole on this task, so, going to pause for now, meet with the datapipeline team, rethink what ops / metrics needs vs what debugging needs, and get back to this later in the sprint, with a tight timebox.

mostlygeek commented 9 years ago

If you want to uniquely identify people but also keep it anonymous, you could sha1/md5 whatever key you're using. I would suggest something like email address instead of a monotonically increasing UID. If the UID is a uuid, then sha1(uuid) may be good enough.