ministryofjustice / analytics-platform

Parent repository for the MOJ Analytics Platform
MIT License
14 stars 1 forks source link

Deployed app stuck in 'ContainerCreating' #61

Closed RobinL closed 6 years ago

RobinL commented 6 years ago

The deployment for coroner-stat-tool-ext completed successfully but the webapp pod is stuck in 'Container Creating'

r4vi commented 6 years ago

same issue for user's rstudio

k describe po hbutchermoj-rstudio-rstu-79fbcc995f-rfgzx -n user-hbutchermoj 
Name:           hbutchermoj-rstudio-rstu-79fbcc995f-rfgzx
Namespace:      user-hbutchermoj
Node:           ip-192-168-10-66.eu-west-1.compute.internal/192.168.10.66
Start Time:     Wed, 12 Sep 2018 10:27:13 +0100
Labels:         app=rstudio
                pod-template-hash=3596775519
Annotations:    iam.amazonaws.com/role=alpha_user_hbutchermoj
Status:         Pending
IP:             
Controlled By:  ReplicaSet/hbutchermoj-rstudio-rstu-79fbcc995f
Containers:
  rstudio-auth-proxy:
    Container ID:   
    Image:          quay.io/mojanalytics/rstudio-auth-proxy:v1.4.3
    Image ID:       
    Port:           3000/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     100m
      memory:  128Mi
    Requests:
      cpu:      25m
      memory:   64Mi
    Readiness:  http-get http://:http/healthz delay=5s timeout=1s period=5s #success=1 #failure=3
    Environment:
      USER:                 hbutchermoj
      APP_PROTOCOL:         https
      APP_HOST:             <set to the key 'app_host' in secret 'hbutchermoj-rstudio-rstu'>           Optional: false
      AUTH0_CLIENT_SECRET:  <set to the key 'client_secret' in secret 'hbutchermoj-rstudio-rstu'>      Optional: false
      AUTH0_CLIENT_ID:      <set to the key 'client_id' in secret 'hbutchermoj-rstudio-rstu'>          Optional: false
      AUTH0_DOMAIN:         <set to the key 'domain' in secret 'hbutchermoj-rstudio-rstu'>             Optional: false
      AUTH0_CALLBACK_URL:   <set to the key 'callback_url' in secret 'hbutchermoj-rstudio-rstu'>       Optional: false
      COOKIE_SECRET:        <set to the key 'cookie_secret' in secret 'hbutchermoj-rstudio-rstu'>      Optional: false
      SECURE_COOKIE_KEY:    <set to the key 'secure_cookie_key' in secret 'hbutchermoj-rstudio-rstu'>  Optional: false
      COOKIE_MAXAGE:        28800
      PROXY_TARGET_HOST:    localhost
      PROXY_TARGET_PORT:    8787
      EXPRESS_HOST:         0.0.0.0
      EXPRESS_PORT:         3000
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from hbutchermoj-rstudio-token-525pp (ro)
  r-studio-server:
    Container ID:   
    Image:          quay.io/mojanalytics/rstudio:3.4.2-5
    Image ID:       
    Port:           8787/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     1500m
      memory:  20Gi
    Requests:
      cpu:      200m
      memory:   5Gi
    Readiness:  http-get http://:http/ delay=5s timeout=1s period=5s #success=1 #failure=3
    Environment:
      USER:                hbutchermoj
      AWS_DEFAULT_REGION:  <set to the key 'aws_default_region' in secret 'hbutchermoj-rstudio-rstu'>  Optional: false
      SECURE_COOKIE_KEY:   <set to the key 'secure_cookie_key' in secret 'hbutchermoj-rstudio-rstu'>   Optional: false
      TOOLS_DOMAIN:        tools.alpha.mojanalytics.xyz
    Mounts:
      /home/hbutchermoj from home (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from hbutchermoj-rstudio-token-525pp (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  home:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  nfs-home
    ReadOnly:   false
  hbutchermoj-rstudio-token-525pp:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  hbutchermoj-rstudio-token-525pp
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age               From                                                  Message
  ----     ------                  ----              ----                                                  -------
  Normal   Scheduled               4m                default-scheduler                                     Successfully assigned hbutchermoj-rstudio-rstu-79fbcc995f-rfgzx to ip-192-168-10-66.eu-west-1.compute.internal
  Normal   SuccessfulMountVolume   4m                kubelet, ip-192-168-10-66.eu-west-1.compute.internal  MountVolume.SetUp succeeded for volume "nfs-home-hbutchermoj"
  Normal   SuccessfulMountVolume   4m                kubelet, ip-192-168-10-66.eu-west-1.compute.internal  MountVolume.SetUp succeeded for volume "hbutchermoj-rstudio-token-525pp"
  Normal   SandboxChanged          4m (x11 over 4m)  kubelet, ip-192-168-10-66.eu-west-1.compute.internal  Pod sandbox changed, it will be killed and re-created.
  Warning  FailedCreatePodSandBox  4m (x12 over 4m)  kubelet, ip-192-168-10-66.eu-west-1.compute.internal  Failed create pod sandbox.
r4vi commented 6 years ago

we have seen that FailedCreatePodSandBox https://github.com/Azure/AKS/issues/496 could be caused by missing unit in Pod resources memory definition but this isn't the case on our cluster. All pods have valid units

davidread commented 6 years ago

This has been resolved. I think the causes was a problem unidling/restarting R Studio - see https://github.com/ministryofjustice/analytics-platform/issues/60#issuecomment-420610740

It still seemed unhappy, even after restarting the pods:

$ kubectl get pods --all-namespaces |grep coroner-stat-tool-ext
apps-prod                  coroner-stat-tool-ext-webapp-7d5965bd9-6fcw4                           3/3       Running             0          2h
$ kubectl describe pods coroner-stat-tool-ext-webapp-7d5965bd9-6fcw4 -n apps-prod
...
  Warning  Unhealthy               3m (x137 over 26m)   kubelet, ip-192-168-14-178.eu-west-1.compute.internal  Liveness probe failed: HTTP probe failed with statuscode: 500

The error in the shiny logs is this:

Warning: Error in library: there is no package called ‘leaflet’
Stack trace (innermost first):
    41: library
     1: runApp
Error : An error has occurred. Check your logs or contact the app author for clarification.

So I think the platform is fine now. The error is with the user's shiny app.