redhat-cop / ocp-disconnected-docs

MIT License
31 stars 18 forks source link

AWS - openshift mirror image occasionally fails to pull all images, leading to bootstrap failure #26

Open sfxworks opened 3 years ago

sfxworks commented 3 years ago

This affects 4.7 and 4.8. Results occur when running AWS instructions on a RHEL 8 VM.

[ec2-user@ip-10-0-75-65 ~]$ ls -lah 4.8.0-rc.3/release/v2/openshift/release/
total 32K
drwxr-xr-x. 3 ec2-user ec2-user  19 Jul  8 20:24 .
drwxr-xr-x. 3 ec2-user ec2-user  21 Jul  8 20:24 ..
drwxr-xr-x. 2 ec2-user ec2-user 28K Jul  8 20:25 blobs

Compared to a complete result

[ec2-user@ip-10-0-75-65 ~]$ ls -lah 4.7.18/release/v2/openshift/release/
total 64K
drwxr-xr-x. 4 ec2-user ec2-user  36 Jul  6 21:44 .
drwxr-xr-x. 3 ec2-user ec2-user  21 Jul  6 17:13 ..
drwxr-xr-x. 2 ec2-user ec2-user 36K Jul  6 21:23 blobs
drwxr-xr-x. 2 ec2-user ec2-user 20K Jul  6 21:23 manifests

Leading to the eventual failure of OpenShift IPI installer using registry mirror

Rerunning the OpenShift 4 mirror image doesn't get all images until the releases directory is completely wiped

Results: https://gist.github.com/sfxworks/3f758d1e80f0210bd7659fc83b8e2ae6

If I had to guess, it seems like it could be a race condition given this "no such file or dir" error that seems to occur from time to time, though I haven't seen the code for this prepull image https://gist.github.com/sfxworks/10a30c92fc10c1ab65db25d281c7520c#file-gistfile1-txt-L600-L601

Edit: Note, after the vmdk is manually pulled and the OS images repulled, everything runs ok Complete install log: https://gist.github.com/sfxworks/9d522342e277b7dde28545f5965f3cc3

arvin-a commented 2 years ago

This is a bug that should be opened. If this is a workaround that is still required please add a doc to this repo.