gravitational / teleport

The easiest, and most secure way to access and protect all of your infrastructure.
https://goteleport.com
GNU Affero General Public License v3.0
17.52k stars 1.75k forks source link

Teleport UI login error #20963

Closed Erick-Reyes closed 1 year ago

Erick-Reyes commented 1 year ago

Expected behavior: Log in to your Teleport cluster thru the UI without any issues (Okta SAML).

Current behavior: When you log in to the teleport cluster via WebUI (Okta SAML), it sometimes greets you with the error below: "Internal error - rpc error: code = Canceled desc = grpc: the client connection is closing" The error goes away when you refresh the page.

Bug details:

Debug logs: Auth Server logs:

Jan 24 18:40:18 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:18Z DEBU [DYNAMODB]  Got 5 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:18 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:18Z INFO [AUTH]      Node "ip-172-18-15-151.ec2.internal" [420238826295-i-0e0b1d5bd0b06d268] is trying to join with role: Instance. auth/join.go:104
Jan 24 18:40:18 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:18Z DEBU [AUTH]      Received Simplified Node Joining request for host "420238826295-i-0e0b1d5bd0b06d268" auth/join_ec2.go:331
Jan 24 18:40:19 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:19Z DEBU [DYNAMODB]  Got 2 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:19 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:19Z INFO [AUTH]      Node "ip-172-18-1-28.ec2.internal" [420238826295-i-08c4f9c6856647ffe] is trying to join with role: Instance. auth/join.go:104
Jan 24 18:40:19 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:19Z DEBU [AUTH]      Received Simplified Node Joining request for host "420238826295-i-08c4f9c6856647ffe" auth/join_ec2.go:331
Jan 24 18:40:20 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:20Z DEBU [DYNAMODB]  Got 6 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:21 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:21Z DEBU [DYNAMODB]  Got 3 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:21 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:21Z INFO [AUTH]      Node "ip-172-18-14-209.ec2.internal" [420238826295-i-08e6b9acd1a01b6fd] is trying to join with role: Instance. auth/join.go:104
Jan 24 18:40:21 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:21Z DEBU [AUTH]      Received Simplified Node Joining request for host "420238826295-i-08e6b9acd1a01b6fd" auth/join_ec2.go:331
Jan 24 18:40:21 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:21Z INFO [AUTH]      Node "ip-172-18-0-181.ec2.internal" [631293667815-i-0503e71ceea301c2b] is trying to join with role: Node. auth/join.go:104
Jan 24 18:40:21 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:21Z DEBU [AUTH]      Received Simplified Node Joining request for host "631293667815-i-0503e71ceea301c2b" auth/join_ec2.go:331
Jan 24 18:40:22 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:22Z DEBU [DYNAMODB]  Got 3 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:23 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:23Z DEBU [DYNAMODB]  Got 3 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:24 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:24Z DEBU [DYNAMODB]  Got 1 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:24 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:24Z DEBU             [SAML] SSO: https://example.okta.com/app/example_teleport_2/REDACTED/sso/saml services/saml.go:121
Jan 24 18:40:24 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:24Z DEBU             [SAML] Issuer: http://www.okta.com/REDACTED services/saml.go:122
Jan 24 18:40:24 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:24Z DEBU             [SAML] ACS: https://teleport.corp-prod1.exampleinternal.com/v1/webapi/saml/acs/okta services/saml.go:123
Jan 24 18:40:24 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:24Z INFO             No assertion_key_pair was detected. Falling back to signing key for all SAML operations. services/saml.go:213
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU [DYNAMODB]  Got 4 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU             [SAML] SSO: https://example.okta.com/app/example_teleport_2/REDACTED/sso/saml services/saml.go:121
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU             [SAML] Issuer: http://www.okta.com/REDACTED services/saml.go:122
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU             [SAML] ACS: https://teleport.corp-prod1.exampleinternal.com/v1/webapi/saml/acs/okta services/saml.go:123
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU [AUTH]      Obtained SAML assertions for "user@example.com". auth/saml.go:535
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU [AUTH]      SAML assertion warnings: &{OneTimeUse:false ProxyRestriction:<nil> NotInAudience:false InvalidTime:false}. auth/saml.go:536
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU [AUTH]      SAML assertion: "username": ["user@example.com"]. auth/saml.go:545
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU [AUTH]      SAML assertion: "groups": [REDACTED-GROUPS]. auth/saml.go:545
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU [AUTH]      Applying 5 SAML attribute to roles mappings. auth/saml.go:557
.........
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU             Skipping login no-login-de9c65c5-7b6e-4c03-857e-1d4d63441e94, not a valid Unix login. services/role.go:265
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU [AUTH]      Generating dynamic SAML identity okta/user@example.com with roles: [engineering editor access auditor]. Dry run: false. auth/saml.go:248
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU [AUTH]      Overwriting existing user "user@example.com" created with saml connector okta. auth/saml.go:303
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z INFO [AUDIT]     user.update cluster_name:teleport-cluster1 code:T1003I connector:okta ei:0 event:user.update expires:2023-01-25T04:40:25.404222905Z name:user@example.com roles:[engineering editor access auditor] time:2023-01-24T18:40:25.452Z uid:709d84e8-e4e7-4e53-b24b-f11b3943fa21 user:system events/emitter.go:263
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU             Skipping login no-login-de9c65c5-7b6e-4c03-857e-1d4d63441e94, not a valid Unix login. services/role.go:265
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU [KEYGEN]    generated user key for [user ec2-user ubuntu root -teleport-internal-join] with expiry on (1674621625) 2023-01-25 04:40:25.556222082 +0000 UTC native/native.go:249
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU [AUTH]      Failed setting default kubernetes cluster for user login (user did not provide a cluster); leaving KubernetesCluster extension in the TLS certificate empty auth/auth.go:1433
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z INFO [CA]        Generating TLS certificate {0xa0c1f90 0xc00535edf0 1.3.9999.1.15=#13046e6f6e65,1.3.9999.1.7=#131174656c65706f72742d636c757374657231,CN=user@example.com,O=engineering+O=editor+O=access+O=auditor,POSTALCODE={\"groups\":[REDACTED-GROUPS\,\"username\":[\"user@example.com\"]},STREET=teleport-cluster1,L=user+L=ec2-user+L=ubuntu+L=root+L=-teleport-internal-join 2023-01-25 04:40:25.562485929 +0000 UTC [] [] 5 []}. common_name:user@example.com dns_names:[] locality:[user ec2-user ubuntu root -teleport-internal-join] not_after:2023-01-25 04:40:25.562485929 +0000 UTC org:[engineering editor access auditor] org_unit:[] tlsca/ca.go:935
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z INFO [AUDIT]     cert.create cert_type:user cluster_name:teleport-cluster1 code:TC000I ei:0 event:cert.create expires:2023-01-25T04:40:25.562485929Z logins:[user ec2-user ubuntu root -teleport-internal-join] prev_identity_expires:0001-01-01T00:00:00Z roles:[engineering editor access auditor] route_to_cluster:teleport-cluster1 teleport_cluster:teleport-cluster1 groups:[REDACTED-GROUPS] username:[user@example.com] user:user@example.com time:2023-01-24T18:40:25.57Z uid:17a04041-2c1b-49da-a332-6b5134953972 events/emitter.go:263
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z INFO [AUDIT]     user.login groups:[REDACTED-GROUPS] username:[user@example.com] cluster_name:teleport-cluster1 code:T1001I ei:0 event:user.login method:saml success:true time:2023-01-24T18:40:25.598Z uid:a231ebf1-a70c-4f13-b1f6-8647c90e9f99 user:user@example.com events/emitter.go:263
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU [AUTH]      ClientCertPool -> cert(teleport-cluster1 issued by teleport-cluster1:117937359651789549920656640410007004161) auth/middleware.go:674
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU [AUTH]      ClientCertPool -> cert(teleport-cluster1 issued by teleport-cluster1:84513399485945296195766192414998272519) auth/middleware.go:674
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU [AUTH:1]    Server certificate cert(9a957fc2-2e02-4cdc-a8e7-47a8e10a1ea0.teleport-cluster1 issued by teleport-cluster1:117937359651789549920656640410007004161). auth/middleware.go:311
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU             [SAML] SSO: https://example.okta.com/app/example_teleport_2/REDACTED/sso/saml services/saml.go:121
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU             [SAML] Issuer: http://www.okta.com/REDACTED services/saml.go:122
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU             [SAML] ACS: https://teleport.corp-prod1.exampleinternal.com/v1/webapi/saml/acs/okta services/saml.go:123
Jan 24 18:40:26 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:26Z DEBU             Skipping login no-login-de9c65c5-7b6e-4c03-857e-1d4d63441e94, not a valid Unix login. services/role.go:265
Jan 24 18:40:26 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:26Z DEBU             Skipping login no-login-de9c65c5-7b6e-4c03-857e-1d4d63441e94, not a valid Unix login. services/role.go:265
Jan 24 18:40:26 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:26Z DEBU [AUTH]      ClientCertPool -> cert(teleport-cluster1 issued by teleport-cluster1:117937359651789549920656640410007004161) auth/middleware.go:674
Jan 24 18:40:26 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:26Z DEBU [AUTH]      ClientCertPool -> cert(teleport-cluster1 issued by teleport-cluster1:84513399485945296195766192414998272519) auth/middleware.go:674
Jan 24 18:40:26 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:26Z DEBU [AUTH:1]    Server certificate cert(9a957fc2-2e02-4cdc-a8e7-47a8e10a1ea0.teleport-cluster1 issued by teleport-cluster1:117937359651789549920656640410007004161). auth/middleware.go:311
Jan 24 18:40:26 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:26Z DEBU [DYNAMODB]  Got 4 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:26 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:26Z DEBU             Skipping login no-login-de9c65c5-7b6e-4c03-857e-1d4d63441e94, not a valid Unix login. services/role.go:265
Jan 24 18:40:27 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:27Z DEBU [DYNAMODB]  Got 3 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:28 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:28Z DEBU [DYNAMODB]  Got 7 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:28 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:28Z INFO [AUTH]      Node "ip-172-18-1-60.ec2.internal" [420238826295-i-06b62170969515475] is trying to join with role: Instance. auth/join.go:104
Jan 24 18:40:28 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:28Z DEBU [AUTH]      Received Simplified Node Joining request for host "420238826295-i-06b62170969515475" auth/join_ec2.go:331
Jan 24 18:40:29 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:29Z DEBU [DYNAMODB]  Got 4 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:30 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:30Z DEBU [DYNAMODB]  Got 3 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:31 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:31Z DEBU [DYNAMODB]  Got 4 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:32 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:32Z DEBU             [SAML] SSO: https://example.okta.com/app/example_teleport_2/REDACTED/sso/saml services/saml.go:121
Jan 24 18:40:32 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:32Z DEBU             [SAML] Issuer: http://www.okta.com/REDACTED services/saml.go:122
Jan 24 18:40:32 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:32Z DEBU             [SAML] ACS: https://teleport.corp-prod1.exampleinternal.com/v1/webapi/saml/acs/okta services/saml.go:123
Jan 24 18:40:32 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:32Z DEBU [DYNAMODB]  Got 1 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:32 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:32Z INFO [AUTH]      Node "ip-172-18-1-75.ec2.internal" [420238826295-i-08490c81d86f95b2c] is trying to join with role: Instance. auth/join.go:104
Jan 24 18:40:32 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:32Z DEBU [AUTH]      Received Simplified Node Joining request for host "420238826295-i-08490c81d86f95b2c" auth/join_ec2.go:331
Jan 24 18:40:32 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:32Z DEBU             Skipping login no-login-de9c65c5-7b6e-4c03-857e-1d4d63441e94, not a valid Unix login. services/role.go:265
Jan 24 18:40:32 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:32Z DEBU             Skipping login no-login-de9c65c5-7b6e-4c03-857e-1d4d63441e94, not a valid Unix login. services/role.go:265
Jan 24 18:40:32 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:32Z DEBU             Skipping login no-login-de9c65c5-7b6e-4c03-857e-1d4d63441e94, not a valid Unix login. services/role.go:265
Jan 24 18:40:32 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:32Z DEBU             Skipping login no-login-de9c65c5-7b6e-4c03-857e-1d4d63441e94, not a valid Unix login. services/role.go:265
Jan 24 18:40:32 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:32Z DEBU             Skipping login no-login-de9c65c5-7b6e-4c03-857e-1d4d63441e94, not a valid Unix login. services/role.go:265
Jan 24 18:40:32 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:32Z DEBU [AUTH]      GetServers(341->341) in 13.384451ms. elapsed_fetch:11.946782ms elapsed_filter:1.437669ms user:user@example.com auth/auth_with_roles.go:1250
Jan 24 18:40:33 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:33Z DEBU [DYNAMODB]  Got 3 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:34 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:34Z DEBU [DYNAMODB]  Got 2 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:35 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:35Z DEBU [DYNAMODB]  Got 3 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:36 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:36Z DEBU [DYNAMODB]  Got 3 new stream shard records. dynamo/shards.go:230

Okta-SAML config:

kind: saml
metadata:
  description: Okta connector
  name: okta
spec:
  acs: https://teleport.prod.example.com/v1/webapi/saml/acs/okta
  attributes_to_roles:
  - name: groups
    roles:
    - engineering
    value: org:engineering
  - name: groups
    roles:
    - integration_support
    value: org:integration_support
  - name: groups
    roles:
    - platform:core
    value: org:platform:core
  - name: groups
    roles:
    - platform:cloudeng
    value: org:platform:cloudeng
  - name: groups
    roles:
    - editor
    - access
    - auditor
    value: ac:teleport:admins
  audience: https://teleport.prod.example.com/v1/webapi/saml/acs/okta
  cert: ""
  display: ""
  entity_descriptor: |
    <?xml version="1.0" encoding="UTF-8"?>
    <md:EntityDescriptor xmlns:md="urn:oasis:names:tc:SAML:2.0:metadata" entityID="http://www.okta.com/REDACTED">
      <md:IDPSSODescriptor WantAuthnRequestsSigned="false" protocolSupportEnumeration="urn:oasis:names:tc:SAML:2.0:protocol">
          <md:KeyDescriptor use="signing">
            <ds:KeyInfo xmlns:ds="http://www.w3.org/2000/09/xmldsig#">
                <ds:X509Data>
                  <ds:X509Certificate>REDACTED</ds:X509Certificate>
                </ds:X509Data>
            </ds:KeyInfo>
          </md:KeyDescriptor>
          <md:NameIDFormat>urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress</md:NameIDFormat>
          <md:NameIDFormat>urn:oasis:names:tc:SAML:1.1:nameid-format:unspecified</md:NameIDFormat>
          <md:SingleSignOnService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST" Location="https://example.okta.com/app/example_teleport_2/REDACTED/sso/saml" />
          <md:SingleSignOnService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-Redirect" Location="https://example.okta.com/app/example_teleport_2/REDACTED/sso/saml" />
      </md:IDPSSODescriptor>
    </md:EntityDescriptor>
  entity_descriptor_url: ""
  issuer: http://www.okta.com/REDACTED
  service_provider_issuer: https://teleport.prod.example.com/v1/webapi/saml/acs/okta
  signing_key_pair:
    cert: |
      -----BEGIN CERTIFICATE-----
      REDACTED
      -----END CERTIFICATE-----
    private_key: ""
  sso: https://example.okta.com/app/example_teleport_2/REDACTED/sso/saml
version: v2

Screenshot of developer tool error on a check: Internal Error

We also have a HAR file upon request.

Extras:

travelton commented 1 year ago

If using systemd, the signal sent to the Teleport process on restart is -HUP.

Forks a new Teleport daemon to serve new connections and initiates the graceful shutdown of the existing process when there are no more clients connected to it.

Ref: https://goteleport.com/docs/reference/signals/

If performing an upgrade of Teleport, while the Proxy is still in rotation and has not been fully drained of connections, the process may hang until all clients disconnect.

If the load balancer attempts to send connections to this Proxy, the following error message will be seen in the Web UI upon authentication.

"Internal error - rpc error: code = Canceled desc = grpc: the client connection is closing"

The proper way to upgrade is to remove the Proxy from the load balancer, drain all connections, and then upgrade the Proxy instance.

russjones commented 1 year ago

Talked with @zmb3 about this. We're thinking about handling this in two ways.

@travelton Because this is a nice to have, we're not scheduling this right now. We'll keep it in mind as a good starter issue for the future.

Erick-Reyes commented 1 year ago

@russjones @zmb3 customer checked/verified the systemd services and states all the processes/services are fine, they are not attempting to restart.

kelcya commented 1 year ago

A little more information. We originally upgraded from 9.x -> 10.x -> 11.x. The issue started happening in 10.x and 11.x. The proxy and auth servers are all new instances. The upgrade process is as followed:

  1. Reduce auth server to 1 instance
  2. Upgrade auth server (instance refresh with new AMI)
  3. Upgrade proxy servers (instance refresh with new AMI)

The autoscaling group removes and terminates the old instance first before launching the new instance.

r0mant commented 1 year ago

Fixed in https://github.com/gravitational/teleport/pull/23691.

kelcya commented 1 year ago

Is it possible to know what versions will include fix #23691?

zmb3 commented 1 year ago

@kelcya the next v10, v11, and v12 release will contain the fix. That will be v12.1.3, v11.3.10, and v10.3.15