sassoftware / viya4-deployment

This project contains Ansible code that creates a baseline in an existing Kubernetes environment for use with the SAS Viya Platform, generates the manifest for an order, and then can also deploy that order into the Kubernetes environment specified.
Apache License 2.0
71 stars 66 forks source link

The playbook is stuck forever at the step "TASK [jump-server : jump-server - group nogroup]" #560

Open raphaelpoumarede opened 4 months ago

raphaelpoumarede commented 4 months ago

Viya4 Deployment Version Details

6.20.1 (latest)

Ansible Variable File Details

## Cluster
PROVIDER: custom
# CLUSTER_NAME normally comes from TF output...here we make it consistent with the existing cluster name.
CLUSTER_NAME: GEL-k8s-oss
NAMESPACE: dac

## MISC
DEPLOY: true # Set to false to stop at generating the manifest

## Storage - we let the tool create the SC for us
V4_CFG_MANAGE_STORAGE: true
# keep JUMP_SVR_RWX_FILESTORE_PATH default value that corresponds to the mount point created by the IaC tool
V4_CFG_RWX_FILESTORE_ENDPOINT: pdcesx02215.race.sas.com # we need to set this because this information can not be pulled from a TF state.

## JUMP VM ACCESS TO PREPARE NFS DIRECTORIES
JUMP_SVR_PRIVATE_KEY: '~/.ssh/id_rsa'
JUMP_SVR_USER: cloud-user # mandatory for V4_CFG_MANAGE_STORAGE to trigger
JUMP_SVR_HOST: sasnode01 # mandatory for V4_CFG_MANAGE_STORAGE to trigger

## SAS Order API Access
V4_CFG_SAS_API_KEY: 'XXXX'
V4_CFG_SAS_API_SECRET: 'XXXX'
V4_CFG_ORDER_NUMBER: 9CYNLY

## CR Access
# V4_CFG_CR_USER: <container_registry_user>
# V4_CFG_CR_PASSWORD: <container_registry_password>

## Ingress
V4_CFG_INGRESS_TYPE: ingress
V4_CFG_INGRESS_FQDN: "dac.osk-ing-stud3.gelenable.sas.com"
V4_CFG_TLS_MODE: "full-stack" # [full-stack|front-door|disabled]

## Postgres
#V4_CFG_POSTGRES_SERVERS:
#    default:
#        internal: true

## Postgres
V4_CFG_POSTGRES_SERVERS:
  default:
    internal: false
    admin: postgres
    password: "XXXXX"
    fqdn: rext03-0072.race.sas.com
    ssl_enforcement_enabled: true
    database: SharedServices    

# CA cert for Postgres (to be added in the trustore)
V4_CFG_TLS_TRUSTED_CA_CERTS: /etc/ssl/certs/ssl-cert-sas-rext03-0072.pem

## LDAP
V4_CFG_EMBEDDED_LDAP_ENABLE: true

## Consul UI
#V4_CFG_CONSUL_ENABLE_LOADBALANCER: false

## SAS/CONNECT
V4_CFG_CONNECT_ENABLE_LOADBALANCER: false

## Cadence and version
V4_CFG_CADENCE_NAME: 'stable'
V4_CFG_CADENCE_VERSION: '2024.03'

## CAS Configuration
V4_CFG_CAS_WORKER_COUNT: '1'
V4_CFG_CAS_ENABLE_BACKUP_CONTROLLER: false
V4_CFG_CAS_ENABLE_LOADBALANCER: false

# Monitoring and logging tools
# for upstream open-source K8s it must be set
V4M_STORAGECLASS: sas
V4M_BASE_DOMAIN: "osk-ing-stud3.gelenable.sas.com"
V4M_GRAFANA_PASSWORD: "Lnxsas!2021"
V4M_KIBANA_PASSWORD: "Lnxsas!2021"

# allow ELASTIC SEARCH to be properly configured
V4_CFG_ELASTICSEARCH_ENABLE: true

# required when we have used viya4-iac-k8s

## 3rd Party

# Ingress Controller
INGRESS_NGINX_CONFIG:
  controller:
    service:
      externalTrafficPolicy: Cluster
      # loadBalancerIP: # Optional : Assigns a static IP to the SAS Viya ingress controller
      loadBalancerSourceRanges: [] # Not supported on open source kubernetes
      annotations:

# Metrics server is already pre-installed with IaC for Upstream Open Source
METRICS_SERVER_ENABLED: false

# NFS Subdir External Provisioner - SAS default storage class
# Updates to support open source Kubernetes
NFS_CLIENT_NAME: nfs-subdir-external-provisioner-sas
NFS_CLIENT_CHART_VERSION: 4.0.16

Steps to Reproduce

deploy K8s with the viya4-iac-k8s tool, then run the viya4-deployment project's playbook: cd ~/viya4-deployment ansible-playbook \ -e BASE_DIR=~/project/deploy/dac-working \ -e KUBECONFIG=~/.kube/config \ -e CONFIG=~/project/deploy/dac-working/ansible-vars.yaml \ -e JUMP_SVR_PRIVATE_KEY=$HOME/.ssh/id_rsa \ -e PROVIDER=custom \ playbooks/playbook.yaml --tags "viya, install"

Expected Behavior

the playbook should complete

Actual Behavior

the playbook remained blocked after this task : TASK [jump-server : jump-server - group nogroup]

Additional Context

the playbook remained blocked after this task : TASK [jump-server : jump-server - group nogroup]

there are no error in the previous tasks.

Here is the ansible debug log (-vvvv) :

TASK [jump-server : jump-server - group nogroup] ****************************************************************************************************************************************************************************************
task path: /home/cloud-user/viya4-deployment/roles/jump-server/tasks/main.yml:33
ok: [localhost] => {
    "ansible_facts": {
        "folder_group": "nogroup"
    },
    "changed": false
}
Tuesday 02 July 2024  10:09:56 +0000 (0:00:00.025)       0:00:14.018 **********
<localhost> Attempting python interpreter discovery
<sasnode01> ESTABLISH SSH CONNECTION FOR USER: cloud-user
<sasnode01> SSH: EXEC ssh -vvv -o ControlMaster=auto -o ControlPersist=30m -o ConnectionAttempts=100 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o 'IdentityFile="/home/cloud-user/.ssh/id_rsa"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="cloud-user"' -o ConnectTimeout=10 -o UserKnownHostsFile=/dev/null -o 'ControlPath="/home/cloud-user/.ansible/cp/e2c1b1f411"' sasnode01 '/bin/sh -c '"'"'echo PLATFORM; uname; echo FOUND; command -v '"'"'"'"'"'"'"'"'python3.12'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'python3.11'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'python3.10'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'python3.9'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'python3.8'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'python3.7'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'python3.6'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'/usr/bin/python3'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'/usr/libexec/platform-python'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'python2.7'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'/usr/bin/python'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'python'"'"'"'"'"'"'"'"'; echo ENDFOUND && sleep 0'"'"''
<sasnode01> (0, b'PLATFORM\nLinux\nFOUND\n/usr/bin/python3.10\n/usr/bin/python3\n/usr/bin/python2.7\n/usr/bin/python\n/usr/bin/python\nENDFOUND\n', b'OpenSSH_8.9p1 Ubuntu-3ubuntu0.10, OpenSSL 3.0.2 15 Mar 2022\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no files\r\ndebug1: /etc/ssh/ssh_config line 21: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 99805\r\ndebug3: mux_client_request_session: session request sent\r\ndebug1: mux_client_request_session: master session id: 4\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 0\r\n')
<sasnode01> ESTABLISH SSH CONNECTION FOR USER: cloud-user
<sasnode01> SSH: EXEC ssh -vvv -o ControlMaster=auto -o ControlPersist=30m -o ConnectionAttempts=100 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o 'IdentityFile="/home/cloud-user/.ssh/id_rsa"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="cloud-user"' -o ConnectTimeout=10 -o UserKnownHostsFile=/dev/null -o 'ControlPath="/home/cloud-user/.ansible/cp/e2c1b1f411"' sasnode01 '/bin/sh -c '"'"'/usr/bin/python3.10 && sleep 0'"'"''
<sasnode01> (0, b'{"platform_dist_result": [], "osrelease_content": "PRETTY_NAME=\\"Ubuntu 22.04.4 LTS\\"\\nNAME=\\"Ubuntu\\"\\nVERSION_ID=\\"22.04\\"\\nVERSION=\\"22.04.4 LTS (Jammy Jellyfish)\\"\\nVERSION_CODENAME=jammy\\nID=ubuntu\\nID_LIKE=debian\\nHOME_URL=\\"https://www.ubuntu.com/\\"\\nSUPPORT_URL=\\"https://help.ubuntu.com/\\"\\nBUG_REPORT_URL=\\"https://bugs.launchpad.net/ubuntu/\\"\\nPRIVACY_POLICY_URL=\\"https://www.ubuntu.com/legal/terms-and-policies/privacy-policy\\"\\nUBUNTU_CODENAME=jammy\\n"}\n', b'OpenSSH_8.9p1 Ubuntu-3ubuntu0.10, OpenSSL 3.0.2 15 Mar 2022\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no files\r\ndebug1: /etc/ssh/ssh_config line 21: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 99805\r\ndebug3: mux_client_request_session: session request sent\r\ndebug1: mux_client_request_session: master session id: 4\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 0\r\n')
Using module file /usr/local/lib/python3.10/dist-packages/ansible/modules/file.py
Pipelining is enabled.
<sasnode01> ESTABLISH SSH CONNECTION FOR USER: cloud-user
<sasnode01> SSH: EXEC ssh -vvv -o ControlMaster=auto -o ControlPersist=30m -o ConnectionAttempts=100 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o 'IdentityFile="/home/cloud-user/.ssh/id_rsa"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="cloud-user"' -o ConnectTimeout=10 -o UserKnownHostsFile=/dev/null -o 'ControlPath="/home/cloud-user/.ansible/cp/e2c1b1f411"' sasnode01 '/bin/sh -c '"'"'sudo -H -S -n  -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-avhlndtvhwwellzhddyslaynfxhgeaks ; /usr/bin/python3'"'"'"'"'"'"'"'"' && sleep 0'"'"''
Escalation succeeded

References

No response

Code of Conduct

dhoucgitter commented 4 months ago

Hi @raphaelpoumarede, is this one OK to close now based on your status yesterday?