raphaeldussin / example.pangeo.io-deploy

Deployment automation for example.pangeo.io
0 stars 1 forks source link

What is the biggest pod we can fit on a single node #8

Open rabernat opened 5 years ago

rabernat commented 5 years ago

We now (#6) have the ability to select notebook pod resources.

image

This is enabled by the following profile list

      profile_list: |
        c.KubeSpawner.profile_list = [
          {
              'display_name': 'small (n1-highmem-2 | 2 cores, 12GB)',
              'kubespawner_override': {
                  'cpu_limit': 2,
                  'cpu_guarantee': 2,
                  'mem_limit': '12G',
                  'mem_guarantee': '12G',
              }
          },
          {
              'display_name': 'standard (n1-highmem-4 | 4 cores, 24GB)',
              'kubespawner_override': {
                  'cpu_limit': 4,
                  'cpu_guarantee': 4,
                  'mem_limit': '24G',
                  'mem_guarantee': '24G',
              }
          },
          {
              'display_name': 'large (n1-highmem-8 | 8 cores, 50GB)',
              'kubespawner_override': {
                  'cpu_limit': 8,
                  'cpu_guarantee': 8,
                  'mem_limit': '50G',
                  'mem_guarantee': '50G',                  
              }
          },
          {
              'display_name': 'x-large (n1-highmem-16 | 16 cores, 96GB RAM)',
              'kubespawner_override': {
                  'cpu_limit': 16,
                  'cpu_guarantee': 14,
                  'mem_limit': '100G',
                  'mem_guarantee': '96G',
              }
          }
        ]

However, the x-large profile won't spawn. It always gives these errors:

$ kubectl describe pod jupyter-rabernat -n staging
...
Events:
  Type     Reason             Age                From                Message
  ----     ------             ----               ----                -------
  Warning  FailedScheduling   21s (x7 over 52s)  default-scheduler   0/5 nodes are available: 4 Insufficient cpu, 5 Insufficient memory.
  Normal   NotTriggerScaleUp  13s (x3 over 43s)  cluster-autoscaler  pod didn't trigger scale-up (it wouldn't fit if a new node is added)

I am using a n1-highmem-16 nodepool, which should have 16 cores and 104 GB of memory available. But kubernetes won't put these pods there. Even after I took the CPU guarantee down to 14 and the memory guarantee down to 96G, it still won't launch.

How can we find out exactly how big kubernetes "thinks" the node is?