redhat-partner-solutions / crucible

Apache License 2.0
34 stars 59 forks source link

Installation of RHOCP cluster by using Redhat certified CNI like Calico/Cilium #248

Closed shashank-6777 closed 1 year ago

shashank-6777 commented 1 year ago

Feature description

Hi Team,

First of all thankyou for providing this robust automation to install openshift. As you know now some of third party CNI's like calico/cilium are present as certified operator by redhat, so is it possible if i choose to deploy my cluster with these CNI instead of OVN/SDN by using current inventory ?

Required statements

nocturnalastro commented 1 year ago

It should be possible, It would be a case of adding custom manifests. Define extra_manifests in your inventory this is a list of dicts. The commit is here. For instance I use it to seed the manifests requires for LSO

    extra_manifests:
      - template: "{{ inventory_dir }}/manifests/{{ cluster_name }}/50-master-create-lvs-for-lso.yml.j2"
      - file: "{{ inventory_dir }}/manifests/{{ cluster_name }}/60-lv-for-lso.yml"
      - file: "{{ inventory_dir }}/manifests/{{ cluster_name }}/70-sc-for-lso.yml"
shashank-6777 commented 1 year ago

Hi @nocturnalastro thanks for your response. But we have one parameter in inventory.yaml i.e network_type and if i set to cilium it throughs me error can you help me here if possible ?

nocturnalastro commented 1 year ago

Yes that will result in a error because it is not one of options allowed for the assisted installer which is used under-hood. However you can supply the extra_manifests required to configure the network. I'll ask internally to get some more details.

nocturnalastro commented 1 year ago

After a quick google I found this https://cloudcult.dev/cilium-installation-openshift-assisted-installer/. Assuming this is still correct, it looks like you need to patch the install_config. Crucible patches it here.

If you wish to insert your own value "cilium". First you would have also to set network_type to a valid value for the Assisted installer API e.g. OVNKubenetes. Then modify the template which can be found here and insert your value instead.

And then add your configuration to the manifests using extra_manifests

If this works let me know :) Perhaps we could figure out a way to make this easier for you.

shashank-6777 commented 1 year ago

Hi @nocturnalastro, thanks alot for your extended support let me try this will get back to you with outcome.

shashank-6777 commented 1 year ago

Hi @nocturnalastro thanks for your support as always. I tried by updating my inventory files according to you. but may be i am missing something, currently i am geeting below error.

TASK [patch_cluster : Apply manifest] *************************************************************************************************************************************************************************
fatal: [bastion]: FAILED! => changed=false
  connection: close
  content: '{"code":605,"message":"file_name in body should match ''^[^/]*\\.(yaml|yml|json)$''"}'
  content_length: '83'
  content_type: application/json
  date: Thu, 22 Jun 2023 15:36:57 GMT
  elapsed: 0
  json:
    code: 605
    message: file_name in body should match '^[^/]*\.(yaml|yml|json)$'
  msg: 'Status code was 422 and not [201]: HTTP Error 422: Unprocessable Entity'
  redirected: false
  status: 422
  url: http://172.90.12.241:8090/api/assisted-install/v2/clusters/52a53c09-9069-483c-a74c-190329e6ce34/manifests
  vary: Accept-Encoding

i have pleaced my "patch-network-type.j2" under crucible directory and define in my inventory.yml attaching the same for your reference and same directory contains manifest for cilium. inventory-cilium.zip

But i am unable to update below point if possible can you please help how and what/where to update exactly. _it looks like you need to patch the install_config. Crucible patches it [here](https://github.com/redhat-partner-solutions/crucible/blob/main/roles/patch_cluster/tasks/main.yml#L29)._

nocturnalastro commented 1 year ago

@shashank-6777 patch-network-type.j2 would have to be placed crucible/roles/patch_cluster/templates/patch-network-type.j2 not in extra_manifests. As you want it to pick up your content instead of the one in the role by default. So this has to be an invasive change, if it works then we can look at how to make it not invasive.

shashank-6777 commented 1 year ago

@nocturnalastro thanks for response after reflecting the changes am getting same error like before.

TASK [create_cluster : Create cluster] ************************************************************************************************************************************************************************
fatal: [bastion]: FAILED! => changed=false
  connection: close
  content: '{"code":606,"message":"network_type in body should be one of [OpenShiftSDN OVNKubernetes]"}'
  content_length: '91'
  content_type: application/json
  date: Mon, 26 Jun 2023 10:12:37 GMT
  elapsed: 0
  json:
    code: 606
    message: network_type in body should be one of [OpenShiftSDN OVNKubernetes]
  msg: 'Status code was 422 and not [201]: HTTP Error 422: Unprocessable Entity'
  redirected: false
  status: 422
  url: http://172.90.12.241:8090/api/assisted-install/v2/clusters
  vary: Accept-Encoding

PLAY RECAP ****************************************************************************************************************************************************************************************************
bastion                    : ok=52   changed=5    unreachable=0    failed=1    skipped=149  rescued=0    ignored=0
localhost                  : ok=2    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0
super1                     : ok=0    changed=0    unreachable=0    failed=0    skipped=17   rescued=0    ignored=0
super2                     : ok=0    changed=0    unreachable=0    failed=0    skipped=17   rescued=0    ignored=0
super3                     : ok=0    changed=0    unreachable=0    failed=0    skipped=17   rescued=0    ignored=0
worker1                    : ok=0    changed=0    unreachable=0    failed=0    skipped=17   rescued=0    ignored=0
worker2                    : ok=0    changed=0    unreachable=0    failed=0    skipped=17   rescued=0    ignored=0

[root@eBPF crucible]# cat roles/patch_cluster/templates/patch-network-type.j2
networking:
  networkType: Cilium
[root@eBPF crucible]#
nocturnalastro commented 1 year ago

patch-network-type.j2 looks good :) But it looks like you didn't set network_type in your inventory to OVNKubernetes try that :).

shashank-6777 commented 1 year ago

Hi @nocturnalastro .. Thanks alot for your support these days i think now i am very close to deploy this because now i can see my nodes are trying to pull cilium images from redhat registry but only my bastion node have internet. So right now am trying to populate this operator in my local registry but not able to populate any operator because of below error. do we need to modify command ?

Below error shows nodes trying to pull cilium image from redhat registry directly.

 reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?" pod="openshift-network-diagnostics/network-check-target-b54kp" podUID=a2854018-99d6-412d-b009-893db155e933
Jun 26 13:36:52 super1 crio[1470]: time="2023-06-26 13:36:52.947041092Z" level=info msg="Checking image status: registry.connect.redhat.com/isovalent/cilium-olm@sha256:aed05a332413c8244b615d6b2f013e4fbc5ce7f65ed7f83213bc3605ae4dedce" id=abeeb3c2-7667-4f5a-a0f8-ca9c71c332b6 name=/runtime.v1.ImageService/ImageStatus
Jun 26 13:36:52 super1 crio[1470]: time="2023-06-26 13:36:52.947249849Z" level=info msg="Image registry.connect.redhat.com/isovalent/cilium-olm@sha256:aed05a332413c8244b615d6b2f013e4fbc5ce7f65ed7f83213bc3605ae4dedce not found" id=abeeb3c2-7667-4f5a-a0f8-ca9c71c332b6 name=/runtime.v1.ImageService/ImageStatus
Jun 26 13:36:52 super1 kubenswrapper[1539]: E0626 13:36:52.947432    1539 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"operator\" with ImagePullBackOff: \"Back-off pulling image \\\"registry.connect.redhat.com/isovalent/cilium-olm@sha256:aed05a332413c8244b615d6b2f013e4fbc5ce7f65ed7f83213bc3605ae4dedce\\\"\"" pod="cilium/cilium-olm-57d447bb5-ggxh2" podUID=e723463d-531e-4c59-b8bf-0f114066e7ac
Jun 26 13:36:53 super1 kubenswrapper[1539]: E0626 13:36:53.947044    1539 pod_workers.go:965] "Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?" pod="openshift-multus/network-metrics-daemon-b78qs" podUID=508f1de1-c798-43be-bd21-0f50309e175f

Error: during any operator populate

fatal: [registry_host]: FAILED! => changed=true
  cmd:
  - /tmp/binaries.abt_r7n8/opm
  - index
  - prune
  - --from-index
  - registry.redhat.io/redhat/redhat-operator-index:v4.12
  - --packages
  - sriov-network-operator
  - --tag
  - eBPF.cluster.test.nfvi.localdomain:5000/olm-index/redhat-operator-index:v4.12
  delta: '0:00:05.601343'
  end: '2023-06-27 12:32:15.636555'
  invocation:
    module_args:
      _raw_params: |-
        "/tmp/binaries.abt_r7n8"/opm index prune
          --from-index "registry.redhat.io/redhat/redhat-operator-index:v4.12"
          --packages "sriov-network-operator"
          --tag "eBPF.cluster.test.nfvi.localdomain:5000/olm-index/redhat-operator-index:v4.12"
      _uses_shell: false
      argv: null
      chdir: null
      creates: null
      executable: null
      removes: null
      stdin: null
      stdin_add_newline: true
      strip_empty_ends: true
      warn: true
  msg: non-zero return code
  rc: 1
  start: '2023-06-27 12:32:10.035212'
  stderr: |-
    time="2023-06-27T12:32:10+09:00" level=warning msg="\x1b[1;33mDEPRECATION NOTICE:\nSqlite-based catalogs and their related subcommands are deprecated. Support for\nthem will be removed in a future release. Please migrate your catalog workflows\nto the new file-based catalog format.\x1b[0m"
    time="2023-06-27T12:32:10+09:00" level=info msg="pruning the index" packages="[sriov-network-operator]"
    time="2023-06-27T12:32:10+09:00" level=info msg="Pulling previous image registry.redhat.io/redhat/redhat-operator-index:v4.12 to get metadata" packages="[sriov-network-operator]"
    time="2023-06-27T12:32:10+09:00" level=info msg="running /bin/podman pull registry.redhat.io/redhat/redhat-operator-index:v4.12" packages="[sriov-network-operator]"
    time="2023-06-27T12:32:13+09:00" level=info msg="running /bin/podman pull registry.redhat.io/redhat/redhat-operator-index:v4.12" packages="[sriov-network-operator]"
    time="2023-06-27T12:32:15+09:00" level=info msg="Getting label data from previous image" packages="[sriov-network-operator]"
    time="2023-06-27T12:32:15+09:00" level=info msg="running podman inspect" packages="[sriov-network-operator]"
    Error: `opm index prune` only supports sqlite-based catalogs. See https://github.com/redhat-openshift-ecosystem/community-operators-prod/issues/793 for instructions on pruning a plaintext files backed catalog.

Please help me what changes we need to reflect so that i can populate this operator in my registry.

nocturnalastro commented 1 year ago

I haven't seen that one before. Try following the procedure here manually. If it works let me know and I'll look at updating the procedure :)

shashank-6777 commented 1 year ago

@nocturnalastro thanks a lot for your support i am able to install cluster now.

[root@eBPF ~]# oc get pods -n cilium
NAME                               READY   STATUS    RESTARTS        AGE
cilium-2h8hq                       1/1     Running   0               5h39m
cilium-jcsfj                       1/1     Running   0               5h49m
cilium-olm-56546bf5cb-7pjhb        1/1     Running   4 (5h30m ago)   5h48m
cilium-operator-69bd5fbbb5-xr4lt   1/1     Running   2 (5h37m ago)   5h49m
cilium-operator-69bd5fbbb5-z4jzr   1/1     Running   5 (3h53m ago)   5h49m
cilium-q4q8f                       1/1     Running   0               5h49m
cilium-vmpmc                       1/1     Running   0               5h39m
cilium-zcmm9                       1/1     Running   0               5h39m
[root@eBPF ~]#

I think crucible playbook to populate the operator needs to be updated. Anyway Many thanks for your help.