m-lab / etl

M-Lab ingestion pipeline
Apache License 2.0
22 stars 7 forks source link

Add support for debug/info/error logging #63

Open stephen-soltesz opened 7 years ago

stephen-soltesz commented 7 years ago

The glog package (https://github.com/golang/glog) automatically creates log files for DEBUG, INFO, ERROR log messages. Files are rotated automatically after MaxSize bytes have been written. That size can be changed, but there is no automatic clean up operation that removes old files.

The container environment for AppEngine Flex has no persistent storage, but it's always possible to write to the "local" filesystem within the limits of the node fs size or container limits. So, to avoid filling the local fs over time, we must also define some cleanup operation ourselves or risk the local filesystem filling from old logs eventually. This doesn't feel right.

In the Dockerfile we can add the flags:

etl_worker -alsologtostderr=true -log_dir /tmp

So that logs are also written to stderr (and collected automatically by stackdriver logging).

We could fork the glog repo and modify it to include a "no file" option. (it does not appear to be maintained).

Alternatives include:

pboothe commented 7 years ago

It looks like if you just want log.Info() and log.Error() instead of the full quite of capabilities, then there are a lot more options than glog.

http://stackoverflow.com/questions/16895651/how-to-implement-level-based-logging-in-golang

stephen-soltesz commented 6 years ago

In principle, https://github.com/GoogleCloudPlatform/fluent-plugin-google-cloud which is a side-car service in AppEngine Flex VMs and GCE and GKE, should support parsing log lines that look like JSON into structured values in Stackdriver logs.

This would be extremely useful for adding searchable and aggregateable metadata at log time without extra log parsing logic later. https://github.com/sirupsen/logrus supports logging in JSON record formats. This could be an excellent fit.

Unfortunately, there is currently a bug in the plugin that rejects some derived timestamp as invalid, so JSON log lines are always parsed as "textPayload".

https://github.com/GoogleCloudPlatform/fluent-plugin-google-cloud/issues/211

I've confirmed a fix by removing the timestamp label from out_from_docker.rb, and restarting fluentd with service google-fluentd restart. But, this will not be practical until fixed upstream.