gravitational / teleport

The easiest, and most secure way to access and protect all of your infrastructure.
https://goteleport.com
GNU Affero General Public License v3.0
17.69k stars 1.77k forks source link

Duplicated SSH session recordings #42263

Open zabszk opened 6 months ago

zabszk commented 6 months ago

Expected behavior: Each SSH session should have exactly one recording and it should have the correct length.

Current behavior: Each SSH session has two recordings with the same Session ID. Second recording is slightly longer and that length is incorrect - the recording ends before it reaches the displayed session duration (screenshot included below). The length of the shorter recording is correct (it matches actual session duration and actual recording length).

We tested tsh ssh on windows and Teleport WebUI, bug occurs in both situations.

Sessions list

The longer recording ends in the middle (that's when the session actually ended). Longer recording

Bug details:

stevenGravy commented 6 months ago

@zabszk please provide what the storage backend is for this. The session listing is based on the audit log so expect this is due to those records.

zabszk commented 6 months ago

@zabszk please provide what the storage backend is for this. The session listing is based on the audit log so expect this is due to those records.

Local directory (default option) is used. I checked and there is one tar file with the matching name and inside the tar there is exactly one file.

I just made a test session and all logs are duplicated.

Audit logs

stevenGravy commented 6 months ago

Did you specify a location for your audit log storage? Is there any kind of file processing that might be happening? If it’s dir then I expect something is causing multiple entries to appear in the json log files.

On Sat, Jun 1, 2024 at 8:02 AM Łukasz Jurczyk @.***> wrote:

@zabszk https://github.com/zabszk please provide what the storage backend is for this. The session listing is based on the audit log so expect this is due to those records.

Local directory (default option) is used. I checked and there is one tar file with the matching name and inside the tar there is exactly one file.

I just made a test session and all logs are duplicated.

Audit.logs.png (view on web) https://github.com/gravitational/teleport/assets/7418075/faaf7401-f57c-4d69-b471-44f4ff52ed61

— Reply to this email directly, view it on GitHub https://github.com/gravitational/teleport/issues/42263#issuecomment-2143423758, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOPERQNBJ24ZCWCOSNRRE3DZFGZ6TAVCNFSM6AAAAABIUBCPV2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBTGQZDGNZVHA . You are receiving this because you commented.Message ID: @.***>

zabszk commented 6 months ago

No, I only specify the location for teleport logs (teleport -> log -> output). This bug occurs on two completely separated teleport installations. Both use Debian 12.

The bug only affects SSH Service in proxy recording mode. The issue is NOT present in node recording mode. Other services (DB, app) and "system logs" (certs issuance, etc...) are logged only once as well.

On the screenshot in my previous reply you can see that the duplicated "Session Ended" log entry has a different time logged (extra 5 seconds). I analyzed other logs and it's always exactly 5 seconds, even for 9 minutes long sessions:

  "session_start": "2024-06-01T11:49:38.97636617Z",
  "session_stop": "2024-06-01T11:58:26.328776262Z",
  "session_start": "2024-06-01T11:49:38.966423395Z",
  "session_stop": "2024-06-01T11:58:21.336166346Z",
stevenGravy commented 6 months ago

Thanks for clarifying. Now that OpenSSH connections supports recording with node recording mode would recommend using that. I expect this is also due to the storage in some way.

zabszk commented 6 months ago

We use recording proxy, because when using node recording mode the user can reboot the server (or just restart teleport service) and then there is no recording available for the entire session and even it's not easily visible that such session happened (you need to analyze the logs, you can't simply check the Session Recordings tab to see a session without recording available).

Regarding the bug - I just found out that if I restart teleport service on the node (in recording proxy mode) during the session, that specific session is displayed properly - only one recording is visible. So I guess that the bug is happening because the node doesn't handle proxy recording mode properly and submits some information (when session ends) that shouldn't be submitted in the proxy recording mode.

zmb3 commented 6 months ago

Are these Teleport nodes or OpenSSH servers?

zabszk commented 6 months ago

Are these Teleport nodes or OpenSSH servers?

Teleport nodes. Sorry for not indicating it explicitly.

zabszk commented 5 months ago

After updating to Teleport 16.0.0 it's even worse. Recording ends after less than a second and nothing is visible on the screen. Screenshot attached.

The bug occurs in proxy and node-sync recording modes. Node recording mode works. I haven't tested other modes.

EDIT: This bug is present in node recording mode as well if you click "Play" within the first seconds after the recording appears. If you click after these first seconds the recording plays fine (this "fix" doesn't work for proxy and node-sync modes).

Recording

zabszk commented 5 months ago

I just retested proxy recording mode after updating to Teleport 16.0.3.

The first recording (the shorter one) plays (but the "Play" button starts working after about 10 seconds after the session finishes, otherwise it shows an empty recording - but that's the issue introduced in 16.0.0 for "node" recording mode as well).

The second recording (the longer one, the duplicate) now doesn't play at all (see screenshot from my previous reply), likely because it has a different Session ID. This wasn't an issue in Teleport 15 (see screenshots from the issue content and my first reply - IDs were the same).

Screenshots from proxy recording mode using Teleport 16.0.3 are below. I started only one session using the WebUI. Logs shows two session and the recording of the longer one (699b9e75-...) doesn't play.

audit log

recordings

zabszk commented 3 months ago

After updating from 16.1.6 to 16.1.8 there is a difference. Now both recordings have the same length and the second recording is the working one (previously the first one was working and the second one was broken, now it's opposite).

Screenshot

Screenshot 2

zmb3 commented 3 months ago

@fspmarshall does #45877 resolve this?

fspmarshall commented 3 months ago

@zmb3 I don't believe so. The case addressed in https://github.com/gravitational/teleport/pull/45877 specifically results in an extra empty session recording, and only appears probabilistically when there is throttling/errors due to very high load that happens to occur at the start of the upload process. This issue seems to be something else.

zabszk commented 3 months ago

After updating to 16.2.0 the first recording is the working one and the second (the later one) is broken - so it's like before 16.1.8. However both recordings have the same duration (just like after updating to 16.1.8).

zabszk commented 3 months ago

@zmb3 I don't believe so. The case addressed in #45877 specifically results in an extra empty session recording, and only appears probabilistically when there is throttling/errors due to very high load that happens to occur at the start of the upload process. This issue seems to be something else.

The extra recording is empty - recording is visible on the list of recordings, but it has no content when you press "Play" and the second recording (the broken one) is not created if I restart the teleport agent during the session (which ofc results in session being terminated).

So it has something in common, but probably it's unrelated - just like you said. To me it looks like the agent tries to upload a recording (or something else) when it shouldn't, because this issue is present only in proxy recording mode. Switching to node recording mode fixes it (or at least was fixing it last time I checked).

zabszk commented 2 months ago

I updated teleport to version 16.2.1 and the issue is still present, so PR #45877 (and #46265) is not related.