ibm-cloud-architecture / terraform-openshift4-aws

OpenShift 4 installation automation asset
Apache License 2.0
82 stars 113 forks source link

Bootstrap Node fails to start bootkube service (bootstrap.ign) #35

Closed ckupe closed 3 years ago

ckupe commented 4 years ago

Description;

Running the terraform plan does not produce a working bootstrap node with healthy bootkube service in disconnected architecture.

How to reproduce:

This is for following a disconnected/airgapped strategy.

  1. Build the edge registry in ECR as per the documentation using 4.4.0-x86_64 as the target version for release images to be mirrored.
  2. Configure the tfvars as follows:
cluster_id = "ocp4-dev"
clustername = "ocp4-dev"
base_domain = "rht-set.com"
openshift_pull_secret = "/home/ckupe/Projects/pull-secret.json"
openshift_installer_url = "https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest"

aws_access_key_id = "<redacted>"
aws_secret_access_key = "<redacted>"
aws_ami = "ami-0409b2cebfc3ac3d0"
aws_extra_tags = {
  "kubernetes.io/cluster/ocp4-dev" = "owned",
  "owner" = "admin"
  }
aws_azs = [
  "us-west-2a",
  "us-west-2b",
  "us-west-2c"
  ]
aws_region = "us-west-2"
aws_publish_strategy = "Internal"
airgapped = {
  enabled = true
  repository = "<redacted>.dkr.ecr.us-west-2.amazonaws.com/ocp440" 
}
  1. terraform apply the entire plan
  2. Build a Client VPN Gateway with mutual authentication method as decribed here https://docs.aws.amazon.com/vpn/latest/clientvpn-admin/cvpn-getting-started.html
  3. Associate new AWS Client VPN Gateway with all private subnets constructed by the terraform plan in order to gain layer 3 access
  4. Authorize 10.0.0.0/8 within the client VPN gateway
  5. Connect workstation to the client VPN gateway
  6. Pull the private and public SSH key produced by the TF plan in the terraform.tfstate file (grep for installkey resource)
  7. ssh -i ./path/to/private.key core@
  8. run the following commands to inspect state of bootkube service:
    sudo systemctl status bootkube
    journalctl -b -f -u bootkube.service

Result:

[core@ip-10-0-129-41 ~]$  journalctl -b -f -u bootkube.service
-- Logs begin at Tue 2020-06-16 18:27:14 UTC. --

No logs are returned from bootkube, nor is the bootkube service online.

Expected Result: bootstrap.ign should have been ingested correctly by ignition at boot to configure the bootkube service at runtime; a dead service and lack of logs suggests bootstrap.ign was not successfully pulled from the S3 Bucket for ignition to work.

ckupe commented 4 years ago

Finding; 'additionalTrustBundle' is not templated into the install-config.yml within the codebase; which is required for the bootstrap to trust the local edge registry in ECR in order to pull the requisite images.

https://www.openshift.com/blog/openshift-4-2-disconnected-install https://repo1.dsop.io/dsop/redhat/platformone/ocp4x/ansible/deploy/-/blob/v2-govcloud-automation/templates/openshift-install/aws-fences-install-config.yaml.j2#L3

ckupe commented 4 years ago

Finding; ECR does not allow for unauthenticated image pulls. additional IAM policies will need to be defined and attached to the nodes in order for them to be able to pull from ECR.

vbudi000 commented 3 years ago

Agreed - it is not implemented fully. - will add in readme