lsst-uk / somerville-operations

User issue reporting and tracking for the Somerville Cloud
0 stars 0 forks source link

Investigate Cluster API #34

Open astrodb opened 2 years ago

astrodb commented 2 years ago

Investigate ClusterAPI as a means for launching kubernetes clusters to replace Magnum and Rancher, and add a bit of documentation to the user docs if it is a replacement.

GregBlow commented 11 months ago

I've tried running through the quickstart guide here:

https://cluster-api.sigs.k8s.io/user/quick-start?search=

It has examples for openstack, but thus far I get as far as cluster provisioning. I think variables might not be set correctly:

gblow@EPCC-WIN-P12:~$ kubectl describe cluster capi-quickstart
Name:         capi-quickstart
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  cluster.x-k8s.io/v1beta1
Kind:         Cluster
Metadata:
  Creation Timestamp:  2023-10-09T10:56:48Z
  Finalizers:
    cluster.cluster.x-k8s.io
  Generation:        1
  Resource Version:  9091
  UID:               df41f7aa-c9ea-4fa7-a7f2-b98654234071
Spec:
  Cluster Network:
    Pods:
      Cidr Blocks:
        192.168.0.0/16
    Service Domain:  cluster.local
  Control Plane Endpoint:
    Host:
    Port:  0
  Control Plane Ref:
    API Version:  controlplane.cluster.x-k8s.io/v1beta1
    Kind:         KubeadmControlPlane
    Name:         capi-quickstart-control-plane
    Namespace:    default
  Infrastructure Ref:
    API Version:  infrastructure.cluster.x-k8s.io/v1alpha7
    Kind:         OpenStackCluster
    Name:         capi-quickstart
    Namespace:    default
Status:
  Conditions:
    Last Transition Time:  2023-10-09T10:56:49Z
    Message:               Scaling up control plane to 3 replicas (actual 0)
    Reason:                ScalingUp
    Severity:              Warning
    Status:                False
    Type:                  Ready
    Last Transition Time:  2023-10-09T10:56:49Z
    Message:               Waiting for control plane provider to indicate the control plane has been initialized
    Reason:                WaitingForControlPlaneProviderInitialized
    Severity:              Info
    Status:                False
    Type:                  ControlPlaneInitialized
    Last Transition Time:  2023-10-09T10:56:49Z
    Message:               Scaling up control plane to 3 replicas (actual 0)
    Reason:                ScalingUp
    Severity:              Warning
    Status:                False
    Type:                  ControlPlaneReady
    Last Transition Time:  2023-10-09T10:56:48Z
    Reason:                WaitingForInfrastructure
    Severity:              Info
    Status:                False
    Type:                  InfrastructureReady
  Observed Generation:     1
  Phase:                   Provisioning
Events:
  Type    Reason        Age                    From                Message
  ----    ------        ----                   ----                -------
  Normal  Provisioning  3m44s (x2 over 3m45s)  cluster-controller  Cluster capi-quickstart is Provisioning
GregBlow commented 11 months ago

Troubleshooting instructions might be found here:

https://github.com/kubernetes-sigs/cluster-api-provider-openstack/blob/main/docs/book/src/clusteropenstack/configuration.md

GregBlow commented 11 months ago

and here:

https://cluster-api.sigs.k8s.io/user/troubleshooting.html

GregBlow commented 11 months ago
gblow@EPCC-WIN-P12:~$ clusterctl describe --show-conditions all cluster capi-quickstart
NAME                                                                READY  SEVERITY  REASON                                     SINCE  MESSAGE

Cluster/capi-quickstart                                             False  Warning   ScalingUp                                  12m    Scaling up control plane to 3 replicas (actual 0)

│           ├─ControlPlaneInitialized                               False  Info      WaitingForControlPlaneProviderInitialized  12m    Waiting for control plane provider to indicate the control plane has been initialized
│           ├─ControlPlaneReady                                     False  Warning   ScalingUp                                  12m    Scaling up control plane to 3 replicas (actual 0)

│           └─InfrastructureReady                                   False  Info      WaitingForInfrastructure                   12m

├─ClusterInfrastructure - OpenStackCluster/capi-quickstart

├─ControlPlane - KubeadmControlPlane/capi-quickstart-control-plane  False  Warning   ScalingUp                                  12m    Scaling up control plane to 3 replicas (actual 0)

│             └─Resized                                             False  Warning   ScalingUp                                  12m    Scaling up control plane to 3 replicas (actual 0)

└─Workers

  └─MachineDeployment/capi-quickstart-md-0                          False  Warning   WaitingForAvailableMachines                12m    Minimum availability requires 3 replicas, current 0 available
    │           └─Available                                         False  Warning   WaitingForAvailableMachines                12m    Minimum availability requires 3 replicas, current 0 available
    └─3 Machines...                                                 False  Info      WaitingForInfrastructure                   12m    See capi-quickstart-md-0-fdw5f-cwqpz, capi-quickstart-md-0-fdw5f-hj898, ...
GregBlow commented 11 months ago

https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/1482

GregBlow commented 11 months ago

capi-kubeadm-control-plane-system.manager.log

GregBlow commented 11 months ago

Same problem seen with docker (non-cloud) capi deployment. Cause unknown.

Zarquan commented 11 months ago

We have a set of scripts that use the StackHPC Helm charts on top of ClusterAPI to install a Kubernetes cluster.

Tested and working on both Cambridge Arcus and Somerville.

GregBlow commented 11 months ago

Thank you, I think my problem might be a bit more fundamental in how the provisioning infrastructure (kind, on my PC) is working.

Zarquan commented 11 months ago

We use a Docker container for the client, which uses Ansible to create a bootstap VM in Openstack with KinD installed and then use that to create the main K8s cluster. When finished we leave the bootstap VM with KinD running to provide control over the main K8s cluster. Our client will run on anything that can support a Docker container.

astrodb commented 9 months ago

RAL has proposed helping us with this. Will contact them this week, with plans to pick it up in the new year.

DP-B21 commented 8 months ago

HI, I will be looking at the quick start guide to help with the investigation

astrodb commented 7 months ago

Opened ticket with RAL last week for mirroring their setup, no repsonse yet.

GregBlow commented 6 months ago

Adverted StackHPC to capi project for their own bootstrap server provisioning.

GregBlow commented 2 months ago

CAPI magnum implemented and working. Testing further in relation to https://github.com/lsst-uk/somerville-operations/issues/174.