gravitational / teleport

The easiest, and most secure way to access and protect all of your infrastructure.
https://goteleport.com
GNU Affero General Public License v3.0
17.64k stars 1.76k forks source link

make teleport log naming convention easier for log aggregators #2388

Closed ghost closed 5 years ago

ghost commented 5 years ago

I followed https://gravitational.com/blog/shipping-ssh-logs-to-sumologic/ to configure sending ssh logs to sumologic. But instead of the sumologic/collector docker image, I used https://github.com/SumoLogic/fluentd-kubernetes-sumologic because it has already been used for kubernetes and containers logs.

The tutorial recommends using /var/lib/teleport/log/*.log as pathExpression to configure sumo sources. However when I configured the sumologic collector that way it started following every file in /var/lib/teleport/log/ (which was more than 100 files because we keep the logs for a long time).

I tried to configure fluentd to keep track of the latest log file only using the following configuration:

      <source>
        @type tail
        tag teleport
        path /mnt/teleport/%Y-%m-%d.*.log
        pos_file /mnt/pos/teleport.log.pos
        <parse>
          @type json
          time_key time
          time_type string
          time_format %Y-%m-%dT%H:%M:%SZ
        </parse>
      </source>
       <filter teleport.**>
        @type kubernetes_sumologic
        source_category teleport
        source_name teleport
      </filter>

where /mnt/teleport/ is a mount of /var/lib/gravity/site/teleport/log/. The corresponding fluentd documentation can be found at https://docs.fluentd.org/v1.0/articles/in_tail

Here I ran into problems.

  1. Despite having name 2018-11-01.00:00:00.log the file is created 12 hours before that time, and is rotated 12 hours after it.
    [root@ip-10-104-2-176 log]# head -1 2018-11-01.00:00:00.log
    {"event":"user.login","method":"local","time":"2018-10-31T12:00:11Z","user":"opscenter@gravitational.io"}
    [root@ip-10-104-2-176 log]# tail -1 2018-11-01.00:00:00.log
    {"event":"user.login","method":"local","time":"2018-11-01T11:59:11Z","user":"opscenter@gravitational.io"}
  2. The name of log files contains dates and times in UTC and doesn't respect local timezone set on the server.

These 2 issues cause fluentd to start reading the latest log file in 12 + local_timezone_difference_with_UTC hours which is a very big delay.

I wonder if it is possible to make this naming convention more predictable and easier to use for log aggregator software?

For instance, teleport could write current records to current.log or last.log, and rotate it to YYYY-mm-DD.00:00:00.log when it is time to rotate.

ghost commented 5 years ago

@klizhentas thanks!

kontsevoy commented 5 years ago

re-opening to update the docs

kontsevoy commented 5 years ago

nothing to document. no changes to commands/etc.