leo-project / leofs

The LeoFS Storage System
https://leo-project.net/leofs/
Apache License 2.0
1.55k stars 155 forks source link

[log] More readable error/warn log #498

Open yosukehara opened 8 years ago

yosukehara commented 8 years ago

We need to revise those log format to more readable because LeoFS' administrators are not able to understand the logs in detail.

mocchira commented 7 years ago

My thoughts.

Two logs

New log for admins

When log should be written

there is something admins have to take a action. NOT log an error like network timeout happened only once. LOG like network timeout happened many times for a long time to a specific node so that there is a case some network|machine trouble have happened, admins have to dig into with their favorite tools.

Format

there is no reason to be formatted as json. just for pretty printing on github.

windkit commented 7 years ago

I think it is useful to provide the "where" information to admins.

We can first define a set of errors that could be easily recognized and therefore with well defined set of error logs.

Logs like

From: leo_storage_0@192.168.0.1 not found
From: leo_storage_1@192.168.0.2 unavailable

could be useful, as user can easily tell where the problem comes from and have a brief picture

With a defined set of errors, we can write a documentation page for them about possible root causes, actions to take, etc.

Undefined errors could just be categorized as "internal trouble" and details output to the dev logs.

windkit commented 7 years ago

We also need a standard format for error log messages, for examples, fields are tab separated So administrators can easily parse them and pass to their monitoring systems.

mocchira commented 6 years ago
mocchira commented 6 years ago

No we've reached the consensus that we will rely on third-party log analysis tools like kibana, logstash etc to analyze log files and provide user-friendly error messages so we will document how to integrate LeoFS with third-party log analysis tools in our official document.

vstax commented 6 years ago

Just wanted to share a working example for fluentd (td-agent.conf) that ships logs to elastic (relies on global paths to log files):

<source>
  @type tail
  path /var/log/leofs/*/app/info,/var/log/leofs/*/app/error
  pos_file /var/log/leofs/leofs.log.pos
  tag leofs.app
  format /^\[(?<level>[^\t]*)\]\t(?<node>[^\t]*)\t(?<time>[^\t]*)\t(?<timestamp>[^\t]*)\t(?<method>[^\t]*)\t((?<line>[^t]*)\t)?(?<message>[^\t]*)/
  time_format %Y-%m-%d %H:%M:%S.%L %z
</source>
<match **>
    @type forward
    require_ack_response false
    heartbeat_type tcp
    phi_failure_detector false
    expire_dns_cache 0
    <server>
      name fluentd-file
      host fluentd.lan
      port 5180
    </server>
</match>
mocchira commented 6 years ago

@vstax Thanks! That's really helpful to us. we are going to cite the above as a fluentd's example.