gravitational / teleport

The easiest, and most secure way to access and protect all of your infrastructure.
https://goteleport.com
GNU Affero General Public License v3.0
17.63k stars 1.76k forks source link

unable to have a global event log when teleport is deployed on-premise in HA #12169

Closed eric-belhomme closed 1 year ago

eric-belhomme commented 2 years ago

Description

I setup an high-availibility Teleport cluster on-premise, private infra composed of:

All nodes are running Linux Alma Linux 8.5 and Teleport v9.0.4 git:v9.0.4-0-gf577413 go1.17.7

This setup first looked to work as expected until I noticed inconsistencies on "audit logs" and "session recording" ! After troubleshooting I understood on don't get the same views depending on which proxy server I get balanced on by haproxy :

  1. on etcd db, I can see all the sessions in an unified way :
    # etcdctl --prefix=true --keys-only=true get /teleport//session_tracker
    /teleport//session_tracker/02beec26-c60d-4f3c-9684-1c0400930bb8
    /teleport//session_tracker/191eac2b-c311-4c7e-95ed-767c444fe78d
    /teleport//session_tracker/1f3787aa-5d32-446a-a4be-182ce442a74d
  2. on minio bucket, I see all recordins as well:
    # mc ls local/teleport
    [2022-04-21 17:08:55 CEST] 2.1KiB STANDARD 02beec26-c60d-4f3c-9684-1c0400930bb8.tar
    [2022-04-21 16:44:58 CEST] 1.1KiB STANDARD 191eac2b-c311-4c7e-95ed-767c444fe78d.tar
    [2022-04-21 17:42:32 CEST] 1.7KiB STANDARD fa0bc0ba-fdb1-4c23-8364-ef46441f0e5f.tar
  3. the local sqlite DB remaining in /var/lib/teleport on proxy servers have an empty events table :
    # sqlite3 /var/lib/teleport/proc/sqlite.db 
    SQLite version 3.26.0 2018-12-01 12:34:55
    Enter ".help" for usage hints.
    sqlite> select count(*) from events;
    0

    So I wonder the guilty is /var/lib/teleport/log/events.log : looking deeper showed me that each line on this file is a JSON blob, and of course each proxy have its own, local log file, this explains that...

That said, you might say it's not a bug, but a misconfiguration, and you're right... But at this time, the only available on Teleport for distributed event logs is DynamoDB, which is only available for Teleport running in AWS cloud ! I might try a 1:1 replacement with ScyllaDB Alternator for example, but this kind of hack seems to me unnecessarily complicated as we already have a key/value server available with etcd !

What happened:

splitted event log and sessions records depending on which proxy is responding

What you expected to happen:

Unified event log and sessions records whichever proxy is responding

Reproduction Steps

As minimally and precisely as possible, describe step-by-step how to reproduce the problem.

  1. Deploy Teleport 9.0.4 proxy nodes on-premise behind haproxy

Server Details

Client Details

Debug Logs

unrelevant

tamcore commented 2 years ago

ScyllaDB Alternator won't work. We've tried going that route and failed, as it doesn't support DescribeTimeToLive which is required by Teleport. You might want to give the SQL backend, which was introduced in 9.1.0, a shot.

webvictim commented 2 years ago

Unfortunately the SQL backend doesn't solve this problem either, as it only keeps the backend storage (users, roles, auth connectors) etc there. There is still no HA on-premise storage option available with Teleport.

The best solution currently is to have Filebeat or another collector read the JSON-formatted Teleport audit logs from each machine's disk and send them to a SIEM or other aggregator.

klizhentas commented 2 years ago

You are right, the only way to have a consistent log in the UI is to use managed backend for audit events, DynamoDB, Firebase.

We are working on a new version of a backend that may be using S3 for everything, we will keep you posted.

For now, we recommend you send all the audit events via our event forwarder to SIEM of your choice to get a centralized view there.

thatguyatgithub commented 1 year ago

Do you have any news on the subject? It really seems like the S3 approach is the way to go, pretty much like Grafana Loki does or similar tools, to use S3 as its only backend. Also this seems the only way to go for having H-A mode on premise installations as well, since deploying a SIEM for this might be kind of a cannonball-fly situation for small deployments in which the SIEM software itself will be several orders of magnitudes more complex and larger than the original teleport setup, not forgetting that the session recordings playback feature is not provided by a SIEM software as well....

webvictim commented 1 year ago

We're working on a Postgres-based HA backend for Teleport which will also cover HA audit logs and allow a comprehensive view of all session recordings/audit events. It should be released with Teleport 13.3.0 at the end of July/beginning of August.

thatguyatgithub commented 1 year ago

Very much appreciated and thanks for the heads-up! Looking forward!

zmb3 commented 1 year ago

Postgres backend was released in 13.3