gravitational / teleport

The easiest, and most secure way to access and protect all of your infrastructure.
https://goteleport.com
GNU Affero General Public License v3.0
17.59k stars 1.76k forks source link

`TestIntegrations/AuditOn` flakiness #17224

Open tigrato opened 2 years ago

tigrato commented 2 years ago

Failure

Link(s) to logs

Relevant snippet

{"caller":"regular/sshserver.go:1864","component":null,"level":"debug","message":"Received \"keepalive@openssh.com\": WantReply: true","timestamp":"2022-10-10T15:54:30Z"}
{"caller":"srv/sess.go:916","component":"session:node","level":"debug","message":"Copying from PTY to writer completed with error read /dev/ptmx: input/output error.","timestamp":"2022-10-10T15:54:30Z"}
{"caller":"regular/sshserver.go:1875","component":null,"level":"debug","message":"Replied to \"keepalive@openssh.com\"","timestamp":"2022-10-10T15:54:30Z"}
{"caller":"events/discard.go:99","component":null,"event_id":"","event_index":0,"event_time":"0001-01-01T00:00:00Z","event_type":"session.end","level":"debug","message":"Discarding stream event","timestamp":"2022-10-10T15:54:31Z"}
{"caller":"srv/sess.go:622","component":"session:node","level":"info","message":"Stopping session 1b59d2dc-40d4-4e83-92ab-681f8cb90b59.","timestamp":"2022-10-10T15:54:31Z"}
{"caller":"srv/sess.go:651","component":"session:node","level":"info","message":"Closing session 1b59d2dc-40d4-4e83-92ab-681f8cb90b59.","timestamp":"2022-10-10T15:54:31Z"}
{"caller":"srv/sess.go:1586","component":"session:node","level":"info","message":"Closing party 624bdc74-987f-4285-a2de-082ecba3f7f5","timestamp":"2022-10-10T15:54:31Z"}
{"caller":"srv/sess.go:1246","component":"session:node","level":"info","message":"Removing party ServerContext(127.0.0.1:35714-\u003e127.0.0.1:36325, user=ci, id=24) party(id=624bdc74-987f-4285-a2de-082ecba3f7f5) from session 1b59d2dc-40d4-4e83-92ab-681f8cb90b59","timestamp":"2022-10-10T15:54:31Z"}
{"caller":"srv/sess.go:1254","component":"session:node","level":"debug","message":"No longer tracking participant: 624bdc74-987f-4285-a2de-082ecba3f7f5","timestamp":"2022-10-10T15:54:31Z"}
{"caller":"events/discard.go:99","component":null,"event_id":"","event_index":0,"event_time":"0001-01-01T00:00:00Z","event_type":"session.leave","level":"debug","message":"Discarding stream event","timestamp":"2022-10-10T15:54:31Z"}
{"caller":"regular/sshserver.go:1581","component":"node","level":"debug","message":"Exec request (\"/proc/self/exe\") complete: 0","timestamp":"2022-10-10T15:54:31Z","trace.fields":{"id":24,"local":{"IP":"127.0.0.1","Port":36325,"Zone":""},"login":"ci","remote":{"IP":"127.0.0.1","Port":35714,"Zone":""},"teleportUser":"ci"}}
{"caller":"srv/sess.go:924","component":"session:node","level":"debug","message":"Copying from reader to PTY completed with error \u003cnil\u003e.","timestamp":"2022-10-10T15:54:31Z"}
{"caller":"regular/sshserver.go:1587","component":"node","level":"info","message":"Failed to send exit status for /proc/self/exe: EOF","timestamp":"2022-10-10T15:54:31Z","trace.fields":{"id":24,"local":{"IP":"127.0.0.1","Port":36325,"Zone":""},"login":"ci","remote":{"IP":"127.0.0.1","Port":35714,"Zone":""},"teleportUser":"ci"}}
{"caller":"srv/monitor.go:253","component":"node","level":"debug","message":"Releasing associated resources - context has been closed.","timestamp":"2022-10-10T15:54:31Z","trace.fields":{"id":24,"local":{"IP":"127.0.0.1","Port":36325,"Zone":""},"login":"ci","remote":{"IP":"127.0.0.1","Port":35714,"Zone":""},"teleportUser":"ci"}}
{"caller":"srv/sess.go:995","component":"node:forward","level":"error","message":"Received error waiting for the interactive session d5837350-35a5-401f-b411-5f9b08b97122 to finish: wait: remote command exited without exit status or exit signal.","timestamp":"2022-10-10T15:54:31Z","trace.fields":{"id":23,"local":{"addr":"127.0.0.1:36325","network":"tcp"},"login":"ci","remote":{"IP":"127.0.0.1","Port":51574,"Zone":""},"teleportUser":"ci"}}
{"caller":"srv/sess.go:916","component":"session:node:forward","level":"debug","message":"Copying from PTY to writer completed with error \u003cnil\u003e.","timestamp":"2022-10-10T15:54:31Z"}
{"caller":"srv/sess.go:1450","component":"session:node","level":"debug","message":"Copying from Party 624bdc74-987f-4285-a2de-082ecba3f7f5 to session writer completed with error \u003cnil\u003e.","timestamp":"2022-10-10T15:54:31Z"}
{"caller":"srv/sess.go:622","component":"session:node:forward","level":"info","message":"Stopping session d5837350-35a5-401f-b411-5f9b08b97122.","timestamp":"2022-10-10T15:54:31Z"}
{"caller":"srv/sess.go:630","component":"session:node:forward","error":"EOF","level":"debug","message":"Failed to close the shell","timestamp":"2022-10-10T15:54:31Z"}
{"caller":"srv/sess.go:633","component":"session:node:forward","error":"EOF","level":"debug","message":"Failed to kill the shell","timestamp":"2022-10-10T15:54:31Z"}
{"caller":"srv/term.go:299","component":"term:local","level":"debug","message":"Closed PTY","timestamp":"2022-10-10T15:54:31Z"}
{"caller":"srv/sess.go:924","component":"session:node:forward","level":"debug","message":"Copying from reader to PTY completed with error \u003cnil\u003e.","timestamp":"2022-10-10T15:54:31Z"}
{"caller":"forward/sshserver.go:979","component":"node:forward","level":"debug","message":"Exec request (\"\") complete: 255","timestamp":"2022-10-10T15:54:31Z","trace.fields":{"id":23,"local":{"addr":"127.0.0.1:36325","network":"tcp"},"login":"ci","remote":{"IP":"127.0.0.1","Port":51574,"Zone":""},"teleportUser":"ci"}}
{"caller":"forward/sshserver.go:988","component":"node:forward","level":"debug","message":"Closing session request to 127.0.0.1:51574 in context 23.","timestamp":"2022-10-10T15:54:31Z","trace.fields":{"dst-addr":"127.0.0.1:36325","src-addr":"127.0.0.1:51574"}}
    integration_test.go:444: 
            Error Trace:    /workspace/integration/integration_test.go:444
            Error:          Received unexpected error:
                            Process exited with status 255
            Test:           TestIntegrations/AuditOn/recording_proxy,_sync_recording
{"caller":"service/supervisor.go:383","component":"proc:16","event":"TeleportExit","l
zmb3 commented 2 years ago

@espadolini could this just be racy shutdown behavior? The error that the ssh.node service is returning is context.Canceled, which happens as a result of the call to *TeleInstance.StopAll() when the test is shutting down.

ibeckermayer commented 2 years ago

https://console.cloud.google.com/cloud-build/builds/d5fffc6e-fc0f-48b3-bc15-f2efb9e7f8d4?project=ci-account

            Test:           TestIntegrations/AuditOn/recording_proxy
                            Process exited with status 255
            Error:          Received unexpected error:
            Error Trace:    /workspace/integration/integration_test.go:444
    integration_test.go:444: 
zmb3 commented 2 years ago

Looks like this started with https://github.com/gravitational/teleport/pull/16953/files

An error that previously went unchecked is now being checked.

espadolini commented 2 years ago

Seems to be some race in how we clean up sessions?

tigrato commented 2 years ago

https://console.cloud.google.com/cloud-build/builds/7af62c55-65c0-4049-bb53-a50ebb609473?project=ci-account

Today is being particularly difficult for this test to pass. It's about the tenth time it fails

ibeckermayer commented 2 years ago

https://console.cloud.google.com/cloud-build/builds/0fb3304e-97d9-4f7a-8f70-cc28afdd66a3?project=ci-account

ibeckermayer commented 2 years ago

https://console.cloud.google.com/cloud-build/builds/970c87a2-cff7-4930-8d6e-549128ee7fa4?project=ci-account

ibeckermayer commented 2 years ago

https://console.cloud.google.com/cloud-build/builds/48d4caa9-5acc-47a1-97f0-a9792fdf47ff?project=ci-account

ibeckermayer commented 2 years ago

@Joerger You vanquished this one in https://github.com/gravitational/teleport/pull/12101 a while back, any ideas as to why it might be acting up again?

zmb3 commented 1 year ago

@jakule good to close this one?

jakule commented 1 year ago

Closing. Looks like https://github.com/gravitational/teleport/pull/17687 fixed the issue.

jakule commented 1 year ago

It survived almost a year.... :cry: Failing build https://github.com/gravitational/teleport/actions/runs/6419752754/job/17430271963

{"caller":"auth/auth.go:1190","component":"auth","level":"debug","message":"Checking for new teleport releases via github api.","timestamp":"2023-10-05T13:34:49Z"}
    integration_test.go:498: 
            Error Trace:    /__w/teleport/teleport/integration/integration_test.go:498
            Error:          Received unexpected error:
                            context deadline exceeded
            Test:           TestIntegrations/AuditOn/recording_proxy_with_upload_to_file_server
{"caller":"sshutils/server.go:425","component":"ssh:node","level":"debug","message":"Server 127.0.0.1:45281 has closed.","timestamp":"2023-10-05T13:34:54Z"}
zmb3 commented 1 year ago

https://github.com/gravitational/teleport/actions/runs/6564546618/job/17831123007

Same failure here. @jakule or @Joerger any ideas?

nklaassen commented 11 months ago

another in v14 merge queue https://github.com/gravitational/teleport/actions/runs/7050209642/job/19190382930

nklaassen commented 11 months ago

another hit off master https://github.com/gravitational/teleport/actions/runs/7066054434/job/19263411490

=== RUN   TestIntegrations/AuditOn/normal_teleport_with_upload_to_file_server
    integration_test.go:500: 
            Error Trace:    /__w/teleport/teleport/integration/integration_test.go:500
            Error:          Received unexpected error:
                            rpc error: code = DeadlineExceeded desc = context deadline exceeded
            Test:           TestIntegrations/AuditOn/normal_teleport_with_upload_to_file_server
        --- FAIL: TestIntegrations/AuditOn/normal_teleport_with_upload_to_file_server (12.98s)
ptgott commented 11 months ago

Another: https://github.com/gravitational/teleport/actions/runs/7187322895/job/19574643836

ptgott commented 10 months ago

Another: https://github.com/gravitational/teleport/actions/runs/7339759493/job/19984550852#step:6:1679

zmb3 commented 10 months ago

v14: https://github.com/gravitational/teleport/actions/runs/7477999541/job/20351955564?pr=36516

    integration_test.go:500: 
            Error Trace:    /__w/teleport/teleport/integration/integration_test.go:500
            Error:          Received unexpected error:
                            rpc error: code = DeadlineExceeded desc = context deadline exceeded
            Test:           TestIntegrations/AuditOn/recording_proxy_with_upload_to_file_server
ravicious commented 6 months ago

v14 https://github.com/gravitational/teleport/actions/runs/8834247032/job/24255465293#step:6:1026

     integration_test.go:504: 
            Error Trace:    /__w/teleport/teleport/integration/integration_test.go:504
            Error:          Received unexpected error:
                            rpc error: code = DeadlineExceeded desc = context deadline exceeded
            Test:           TestIntegrations/AuditOn/normal_teleport
ravicious commented 5 months ago

v14 https://github.com/gravitational/teleport/actions/runs/9353591579/job/25744470753#step:6:996

    integration_test.go:504: 
            Error Trace:    /__w/teleport/teleport/integration/integration_test.go:504
            Error:          Received unexpected error:
                            rpc error: code = DeadlineExceeded desc = context deadline exceeded
            Test:           TestIntegrations/AuditOn/recording_proxy_with_upload_to_file_server
nklaassen commented 2 months ago

I think this is still flaky

    integration_test.go:503: 
            Error Trace:    /__w/teleport/teleport/integration/integration_test.go:503
            Error:          Received unexpected error:
                            rpc error: code = DeadlineExceeded desc = context deadline exceeded
            Test:           TestIntegrations/AuditOn/recording_proxy,_sync_recording

https://github.com/gravitational/teleport/actions/runs/10374191024/job/28721090769