Open titou10titou10 opened 3 months ago
Hi @titou10titou10,
I tried the workaround but I think rhel is missing zincati quay.io/okd/scos-content@sha256:cb68498aceefa81f105c4ce6c74787c3e1281d141725b0e20df555aa549dc5aa
this container exists with
Error msg: error running preset on unit: Failed to preset unit: Unit file zincati.service does not exist.\n)\nI0825 06:38:53.624260 6508 file_writers.go:293] Writing systemd unit \"install-to-disk.service\"\n"
and installation stuck at Installing: bootstrap
. I even creating dummy zincati.service still fails.
I spoke too soon,
It took some hours to get reflected in the console. It turns out the zincati
is not required.
And the bootkube commands take a while and while running doesn't create any logs in systemctl or change status while in running.
There was one issue though had to run this code to fix the network I am setting up single node installation
cat << EOF | tee /etc/kubernetes/cni/net.d/10-containerd-net.conflist
{
"cniVersion": "1.0.0",
"name": "containerd-net",
"plugins": [
{
"type": "bridge",
"bridge": "cni0",
"isGateway": true,
"ipMasq": true,
"promiscMode": true,
"ipam": {
"type": "host-local",
"ranges": [
[{
"subnet": "10.128.0.0/14"
}]
],
"routes": [
{ "dst": "0.0.0.0/0" },
{ "dst": "::/0" }
]
}
},
{
"type": "portmap",
"capabilities": {"portMappings": true},
"externalSetMarkChain": "KUBE-MARK-MASQ"
}
]
}
EOF
I'm not sure what exactly your code is doing but maybe you are not aware that "extra" manifests can be added before the creation of the iso image. Inside the directory where you set the install-config and agent-config files, create an "openshift" directory and create additional manifests:
Refs:
This page seems related to what you are doing, and maybe you can create a manifest with it and put in under the install/openshift directory?
In my install, I have this extra "network-03-config.yaml" manifest file in install/openshift:
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
name: cluster
spec:
defaultNetwork:
ovnKubernetesConfig:
genevePort: 6082
# not necessary as OKD detects the underlying MTU and set the value to 9000-100 by itself
mtu: 8900
ipsecConfig:
mode: Disabled
ipv4:
internalJoinSubnet: 100.65.0.0/16
internalTransitSwitchSubnet: 100.89.0.0/16
When I boot the OKD control for first time the network plugin was not configured in journalctl I had log saying No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started
so I created that file manually. I had dual stack configuration maybe that caused. I am installing it again let's see if I am getting the same issue. I think this caused because of some bug.
After some time I restarted the server actually couple of time after that ovn was not working at all. So I am trying to reinstall. I had some issues in my network I resolved them let's see if it works or not.
I was being too desperate it took some time and then the No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started
gone away.
But @titou10titou10 thanks a lot for the investigation it was really big help saved a ton of time.
Context
Trying to install a cluster (3 masters + 2 workers):
It is important to note that the install works perfectly well with the exact same agent and install config files for
Summary
It fails with the following error from the "release-image-pivot" service:
The cause of the problem is the OS image used as bootstrap: fedora-coreos-39.20231101.3.0-live.x86_64.iso
Details
All the details with debug info and configuration files are described in this discussion. The logs there etc are for v4.16.0-0.okd-scos-2024-08-01-132038 but they are the same for v4.16.0-0.okd-scos-2024-08-21-155613
Workarounds
Overriding the bootstrap OS image with a RHCOS image make the installation succeed
I did not choose a random bootstrap OS image, this is the one for v4.16 specified for an OCP installation via the ABI as specified here: https://github.com/openshift/assisted-service/blob/d3324b06a7c7772f4619c3ab13dd8c0706e55fd9/deploy/podman/configmap.yml#L25
It's probably possible to use another rhcos image as during the install process, the nodes upgrades to v418.9.202408211033-0
Workaround for an Agent Installer (ABI) successful install:
Before building the ISO image, override the bootstrap OS image like this:
Workaround for an Assisted Installer successfull install:
The procedure is described here: https://github.com/openshift/assisted-service/tree/master/deploy/podman In the okd-configmap.yml file, replace (at least) the following variables: