redhat-partner-solutions / crucible

Apache License 2.0
35 stars 59 forks source link

/dev/vda: "Unable to detect Device type " during running of "Install-cluster" Playbook. #172

Closed poojagupta1418 closed 1 year ago

poojagupta1418 commented 2 years ago

Bug description

we are deploying SNO feature on KVM hosted environment. and during the execution of playbook "site.yaml" it is getting failed with teh error msg : messages\\":[{\\"string\\":\\"/dev/vda: Unable to detect device type\\",\\"severity\\":\\"error\\"}]

we have tried to change the selinux policy configured as "Enforcing" and selinux security context with as below: [root@basemachine ~]# ls -lZ /home/4libvirt/images/ total 1173036 -rw-------. 1 qemu qemu system_u:object_r:virt_image_t:s0 1047527424 Aug 11 12:00 discovery-image-iso-8d4c4242-a408-4df8-b0bf-0a0d11b5535b.img -rwxr-xr-x. 1 qemu qemu system_u:object_r:virt_image_t:s0 429562527744 Aug 11 11:28 nec_super1_main.qcow2

Environment details: OCP : 4.10.13 rhel: 8.4 crucible playbooks

Kindly provide support the same and let us know for other information if required.

OpenShift version

other (provide in the description)

Assisted Installer version

v1.0.24.2

Relevant log output

No response

Inventory file

No response

Required statements

poojagupta1418 commented 2 years ago

attached journalctl logs for reference [journalctl logs.zip](https://github.com/redhat-partner-solutions/crucible/files/9308691/journalctl.logs.zip)

poojagupta1418 commented 2 years ago

Kindly support on it. appreciate any response towards its solution

nocturnalastro commented 2 years ago

Sorry about the delay, I hadn't been notified there was new issue. Hmm, have you tried using the default libvirt location?

Also try using a newer version of assisted installer crucible now supports the v2 api perhaps this is something that has been fixed in the newer version.

In the mean time I'll see if I can recreate it on my side when I get some free time.

poojagupta1418 commented 2 years ago

Thanks @nocturnalastro for your update I tried with latest image of assisted-installer (podman pull quay.io/ocpmetal/assisted-installer) and with latest code of cruciple playbook(https://github.com/redhat-partner-solutions/crucible) but i am getting below error msg: TASK [get_image_hash : Get controller image hash] ** Friday 26 August 2022 16:47:15 +0000 (0:00:00.059) 0:00:04.568 ***** fatal: [localhost]: FAILED! => msg: |- The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'url'

The error appears to be in '/home/ansible/crucible/roles/get_image_hash/tasks/get_image_hash.yml': line 4, column 5, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

  block:
  - name: "Get {{ item.key }} image hash"
    ^ here
We could be wrong, but this one looks like it might be an issue with
missing quotes. Always quote template expression brackets when they
start a value. For instance:

    with_items:
      - {{ foo }}

Should be written as:

    with_items:
      - "{{ foo }}"
 Kindly support on this.
nocturnalastro commented 2 years ago

Are you overriding the release_images? It looks like one might be malformed please check the transition doc

poojagupta1418 commented 2 years ago

Hi @nocturnalastro I cleaned assisted installer pod and changed the inventory part in two section as per Transition doc but still getting same issue.

Could you please help me out to check release image overriding steps, how i can be sure of this ? Note : if i changed the playbook and mention hardcoded url for each item(images) as below :

It is getting passed but moving forwards facing another code syntax issue may be due to some dependency left with those manual changes.

kindly support on it

nocturnalastro commented 2 years ago

Yes you can't hard code that step as there is multiple values that need to be ran though that task. If you are pulling all your images from a local registry you will need to update release_images: as shown in the transition doc and assisted_service_image_repo_url to point at your registry. You may also have to override the tags depending on the tags in your registry. https://github.com/redhat-partner-solutions/crucible/blob/main/roles/get_image_hash/defaults/main.yml#L81-L95

poojagupta1418 commented 2 years ago

Thanks @nocturnalastro for your guidance. I already updated the following role as per my local registry tag and images:

assisted_service_image_repo_url: quay.io/edge-infrastructure

assisted_service_image_repo_url: registry-quay.sno.localdomain:5002/ocpmetal

assisted_installer_images: controller: url: "{{ assisted_service_image_repo_url }}/assisted-installer-controller:{{ controller_tag }}" installer_agent: url: "{{ assisted_service_image_repo_url }}/assisted-installer-agent:{{ installer_agent_tag }}" installer: url: "{{ assisted_service_image_repo_url }}/assisted-installer:{{ installer_tag }}" service: url: "{{ assisted_service_image_repo_url }}/assisted-service:{{ assisted_service_tag }}" gui: url: "{{ assisted_service_image_repo_url }}/assisted-installer-ui:{{ assisted_service_gui_tag }}" image_service: url: "{{ assisted_service_image_repo_url }}/assisted-image-service:{{ assisted_service_image_service_tag }}"

But still not succeed to move forward. Please refer inventory too and guide for getting out this issue. inventory.zip

nocturnalastro commented 2 years ago

In your invetory you have

      controller:
        image: registry-quay.sno.localdomain:5002/ocpmetal/assisted-installer-controller
        tag: "latest"
      installer_agent:
        image: registry-quay.sno.localdomain:5002/ocpmetal/assisted-installer-agent
        tag: "latest"
      installer:
        image: registry-quay.sno.localdomain:5002/ocpmetal/assisted-installer
        tag: "latest"

it doesn't contain url. you should combine image and tag into url.

poojagupta1418 commented 2 years ago

Thanks @nocturnalastro for your support. After updating inventory for url , above issue got resolved. JFYI , another update is also required for changing the template file (configmap.yaml.j2) because of below error: image

Now, we are getting failure on successful running the assisted service i.e. it is degraded state due to some of pods are in EXIT state:

podman logs assisted-installer-service

Error msg: {"file":"/go/src/github.com/openshift/origin/pkg/servers/servers.go:77","func":"github.com/openshift/assisted-image-service/pkg/servers.(ServerInfo).httpListen","level":"info","msg":"Starting http handler on :8888...","time":"2022-08-29T16:28:40Z"} {"file":"/go/src/github.com/openshift/origin/pkg/servers/servers.go:79","func":"github.com/openshift/assisted-image-service/pkg/servers.(ServerInfo).httpListen","level":"fatal","msg":"HTTP listener closed: listen tcp :8888: bind: address already in use","time":"2022-08-29T16:28:40Z"}

podman logs assisted-installer-ui

nginx 16:29:43.09 INFO ==> Starting NGINX setup nginx 16:29:43.13 INFO ==> Validating settings in NGINX_* env vars chmod: changing permissions of '/proc/self/fd/1': Permission denied chmod: changing permissions of '/proc/self/fd/2': Permission denied

podman logs assisted-installer-db

initdb: error: cannot be run as root Please log in (using, e.g., "su") as the (unprivileged) user that will own the server process.

image

Kindly provide your support and guidance on the same.

nocturnalastro commented 2 years ago

It looks like you have get_release_images set to false. https://github.com/redhat-partner-solutions/crucible/blob/main/roles/get_image_hash/tasks/main.yml#L68

This means that you should rename the variables in your inventory https://github.com/redhat-partner-solutions/crucible/blob/main/roles/get_image_hash/tasks/main.yml#L101-L102

poojagupta1418 commented 2 years ago

Hi @nocturnalastro thanks for the update You are suggesting the above container failure issue is due to that parameter(get_release_images) or anything else as we have already configure inventory according to playbook and moving ahead but currently facing an issue with container as per previous comment

nocturnalastro commented 2 years ago

The issue with the pod is likely a malformed configmap or lack of one completely given that the template failed. The issue with the template is because you have get_release_images: False the renaming of the variables which happens in the second link isn't happening. You will need to to that manually in the inventory. so will want something like assisted_installer_os_images: "{{ os_images }}" and assisted_installer_release_images: "{{ release_images }}" in your inventory.

nocturnalastro commented 1 year ago

@poojagupta1418 I assume since my last message was 27 days ago that this issue is closed, feel free to re-open or open a new issue if it is not.