vmware / vic

vSphere Integrated Containers Engine is a container runtime for vSphere.
http://vmware.github.io/vic
Other
640 stars 173 forks source link

bootstrap.iso for rhel 7.7 having some issues #8616

Open aviratna opened 4 years ago

aviratna commented 4 years ago

Reference: #8569 #8575

We tried with rhel 7.7 & vic 1.5.4 we are getting same issue.

Summary: bootstrap.iso file builds successfully and containers could startup however, tether doesn't start (refer debug attached) and container fails with "docker: Error response from daemon: Server error from portlayer: unable to wait for process launch status: container VM has unexpectedly powered off."

vSphere and vCenter Server version vSphere 6.7u2

VIC version VIC Version: 1.5.4

@YanzhaoLi @DanielXiao @malikkal @hickeng

aviratna commented 4 years ago

Please find the logs below: ++ drivers+=("vmw_pvscsi" "vmxnet3" "nfnetlink" "iptable_filter" "xt_conntrack" "nf_nat_ipv4" "iptable_nat" "nf_conntrack" "nf_conntrack_ipv4" "nf_defrag_ipv4" "ipt_REJECT" "xt_state") ++ for i in '${drivers[@]}' ++ modprobe vmw_pvscsi [ 4.987530] piix4_smbus 0000:00:07.3: SMBus Host Controller not enabled!

tether tmpfs size before copying libraries: ++ df -k /mnt/containerfs/.tether Filesystem 1K-blocks Used Available Use% Mounted on tmpfs 230400 191272 39128 84% /mnt/containerfs/.tether

++ cp /sbin/rngd /mnt/containerfs/.tether/bin/rngd ++ cp -Ln /lib64/libcom_err.so.2 /lib64/libcom_err.so.2.1 /lib64/libcrypt-2.17.so /lib64/libcrypt.so.1 /lib64/libcrypto.so.1.0.2k /lib64/libcrypto.so.10 /lib64/libcryptsetup.so.12 /lib64/libcryptsetup.so.12.3.0 /lib64/libcryptsetup.so.4 /lib64/libcryptsetup.so.4.7.0 /lib64/libgcrypt.so.11 /lib64/libgcrypt.so.11.8.2 /lib64/libk5crypto.so.3 /lib64/libk5crypto.so.3.1 /lib64/libc.so.6 /lib64/libcurl.so.4 /lib64/libcurl.so.4.3.0 /lib64/libdl.so.2 /lib64/libfreebl3.chk /lib64/libfreebl3.so /lib64/libfreeblpriv3.chk /lib64/libfreeblpriv3.so /lib64/libgpg-error.so.0 /lib64/libgpg-error.so.0.10.0 /lib64/libgssapi_krb5.so.2 /lib64/libgssapi_krb5.so.2.2 /lib64/libidn.so.11 /lib64/libidn.so.11.6.11 /lib64/libkeyutils.so.1 /lib64/libkeyutils.so.1.5 /lib64/libkrb5.so.3 /lib64/libkrb5.so.3.3 /lib64/libkrb5support.so.0 /lib64/libkrb5support.so.0.1 /lib64/liblber-2.4.so.2 /lib64/liblber-2.4.so.2.10.7 /lib64/libldap-2.4.so.2 /lib64/libldap-2.4.so.2.10.7 /lib64/libldap_r-2.4.so.2 /lib64/libldap_r-2.4.so.2.10.7 /lib64/liblzma.so.5 /lib64/liblzma.so.5.2.2 /lib64/libm.so.6 /lib64/libnspr4.so /lib64/libnss3.so /lib64/libnss_compat-2.17.so /lib64/libnss_compat.so.2 /lib64/libnss_db-2.17.so /lib64/libnss_db.so.2 /lib64/libnss_dns-2.17.so /lib64/libnss_dns.so.2 /lib64/libnss_files-2.17.so /lib64/libnss_files.so.2 /lib64/libnss_hesiod-2.17.so /lib64/libnss_hesiod.so.2 /lib64/libnss_myhostname.so.2 /lib64/libnss_mymachines.so.2 /lib64/libnss_nis-2.17.so /lib64/libnss_nis.so.2 /lib64/libnss_nisplus-2.17.so /lib64/libnss_nisplus.so.2 /lib64/libnssckbi.so /lib64/libnssdbm3.chk /lib64/libnssdbm3.so /lib64/libnsspem.so /lib64/libnsssysinit.so /lib64/libnssutil3.so /lib64/libpcre.so.1 /lib64/libpcre.so.1.2.0 /lib64/libplc4.so /lib64/libplds4.so /lib64/libpthread.so.0 /lib64/libresolv.so.2 /lib64/librt.so.1 /lib64/libsasl2.so.3 /lib64/libsasl2.so.3.0.0 /lib64/libselinux.so.1 /lib64/libsmime3.so /lib64/libssh2.so.1 /lib64/libssh2.so.1.0.1 /lib64/libssl.so.1.0.2k /lib64/libssl.so.10 /lib64/libssl3.so /lib64/libsysfs.so.2 /lib64/libsysfs.so.2.0.1 /lib64/libxml2.so.2 /lib64/libxml2.so.2.9.1 /lib64/libz.so.1 /lib64/libz.so.1.2.7 /mnt/containerfs/.tether/lib64/ cp: error writing '/mnt/containerfs/.tether/lib64/libxml2.so.2.9.1': No space left on device** cp: failed to extend '/mnt/containerfs/.tether/lib64/libxml2.so.2.9.1': No space left on device** cp: error writing '/mnt/containerfs/.tether/lib64/libz.so.1': No space left on device cp: failed to extend '/mnt/containerfs/.tether/lib64/libz.so.1': No space left on device cp: error writing '/mnt/containerfs/.tether/lib64/libz.so.1.2.7': No space left on device cp: failed to extend '/mnt/containerfs/.tether/lib64/libz.so.1.2.7': No space left on device

cat: write error: No space left on device ++ echo 'tether tmpfs size after copying libraries: ' tether tmpfs size after copying libraries: ++ df -k /mnt/containerfs/.tether Filesystem 1K-blocks Used Available Use% Mounted on tmpfs 230400 230400 0 100% /mnt/containerfs/.tether

++ echo 'switching to the new mount' switching to the new mount ++ systemctl Failed to get D-Bus connection: Operation not permitted +++ readlink /usr/sbin/switch_root ++ [[ '' == \t\o\y\b\o\x ]] ++ exec switch_root /mnt/containerfs /.tether/tether all stderr redirected to debug logFeb 9 2020 08:04:13.390Z INFO Registering tether extension Attach Feb 9 2020 08:04:13.426Z INFO Registering tether extension Toolbox Feb 9 2020 08:04:13.426Z INFO Registering tether extension Entropy Feb 9 2020 08:04:13.428Z DEBUG [BEGIN] [vic/lib/tether.(tether).Start:542] main tether loop Feb 9 2020 08:04:13.428Z DEBUG [BEGIN] [vic/lib/tether.(tether).setup:140] main tether setup Feb 9 2020 08:04:13.428Z DEBUG [BEGIN] [main.(operations).Log:57] operations.Log Feb 9 2020 08:04:13.428Z INFO opening /dev/ttyS1 for debug log Feb 9 2020 08:04:13.429Z DEBUG [ END ] [main.(operations).Log:57] [364.061µs] operations.Log Feb 9 2020 08:04:13.433Z INFO Started reaping child processes Feb 9 2020 08:04:13.443Z DEBUG writing "::1 localhost localhost.localdomain localhost6 localhost6.localdomain6\n" to /.tether/etc/hosts Feb 9 2020 08:04:13.443Z DEBUG writing "127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4\n" to /.tether/etc/hosts Feb 9 2020 08:04:13.443Z ERROR Unable to flush file content to /.tether/etc/hosts717335263: write /.tether/etc/hosts717335263: no space left on device Feb 9 2020 08:04:13.443Z DEBUG writing "fe00:: ip6-localnet\n" to /.tether/etc/hosts Feb 9 2020 08:04:13.443Z DEBUG writing "ff00:: ip6-mcastprefix\n" to /.tether/etc/hosts Feb 9 2020 08:04:13.443Z DEBUG writing "ff02::1 ip6-allnodes\n" to /.tether/etc/hosts Feb 9 2020 08:04:13.444Z DEBUG writing "ff02::2 ip6-allrouters\n" to /.tether/etc/hosts Feb 9 2020 08:04:13.444Z DEBUG writing "127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4\n" to /.tether/etc/hosts Feb 9 2020 08:04:13.444Z DEBUG writing "::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 ip6-localhost ip6-loopback\n" to /.tether/etc/hosts Feb 9 2020 08:04:13.444Z ERROR Unable to flush file content to /.tether/etc/hosts109424818: write /.tether/etc/hosts109424818: no space left on device Feb 9 2020 08:04:13.444Z ERROR Failed tether setup: write /.tether/etc/hosts109424818: no space left on device Feb 9 2020 08:04:13.444Z DEBUG [ END ] [vic/lib/tether.(tether).setup:140] [16.23942ms] main tether setup Feb 9 2020 08:04:13.444Z ERROR Failed to run setup: write /.tether/etc/hosts109424818: no space left on device Feb 9 2020 08:04:13.445Z DEBUG [ END ] [vic/lib/tether.(tether).Start:542] [18.64275ms] main tether loop Feb 9 2020 08:04:13.445Z ERROR write /.tether/etc/hosts109424818: no space left on device Feb 9 2020 08:04:13.445Z INFO Powering off the system [ 5.878074] Power down.

YanzhaoLi commented 4 years ago

@aviratna It seems that the tmpfs /mnt/containerfs/ was used up. And it looks like a bug that the computed size is not large enough in https://github.com/vmware/vic/blob/9becae4ee68cbe807dc5ecdc65d901ce6a4278b3/isos/bootstrap.sh#L157

I'll try to fix it. And a workaround is to increase the size like doubling it and build the bootstrop.iso again.

And could you help do a test with this pr: https://github.com/vmware/vic/pull/8618

malikkal commented 4 years ago

Thank you @YanzhaoLi

Making custom bootstrap iso
building rhel-7.7
Preparing systemd for bootstrap
Total tempfs size: 241
Constructing initramfs archive
1760669 blocks
Embedding build version v1.5.5-rc1-0-9becae4 (use BUILD_NUMBER environment variable to override)

will try the new bootstrap.iso and update here..

aviratna commented 4 years ago

@YanzhaoLi We are getting below error after building with above changes mentioned.

Error in file: [ 4.795077] piix4_smbus 0000:00:07.3: SMBus Host Controller not enabled!

[ 4.921772] Error: Driver 'pcspkr' is already registered, aborting...

++ mount -t tmpfs -o size=m tmpfs /mnt/containerfs/.tether/ ++ cp -Ln /lib64/libsysfs.so.2 /lib64/libsysfs.so.2.0.1 /lib64/libm.so.6 /lib64/libm-2.17.so '/lib64/libgcc_s' /lib64/libip4tc.so.0 /lib64/libip4tc.so.0.1.0 /lib64/libip6tc.so.0 /lib64/libip6tc.so.0.1.0 /lib64/libiptc.so.0 /lib64/libiptc.so.0.0.0 /lib64/libxtables.so.10 /lib64/libxtables.so.10.0.0 /lib64/libdl-2.17.so /lib64/libdl.so.2 /lib64/libc.so.6 /lib64/libc-2.17.so /mnt/containerfs/.tether/lib64/ cp: cannot stat '/lib64/libgcc_s': No such file or directory

/bin/repoinit: line 36: cannot create temp file for here-document: No space left on device****

Failed to get D-Bus connection: Operation not permitted

Feb 17 2020 02:38:51.685Z ERROR Starting entropy failed with "fork/exec : no such file or directory"

Feb 17 2020 02:38:51.686Z ERROR Failed to start extension Entropy: fork/exec : no such file or directory Feb 17 2020 02:38:51.694Z ERROR Failed to run setup: fork/exec : no such file or directory

Feb 17 2020 02:38:51.696Z ERROR fork/exec : no such file or directory

Please find the detailed tether.debug log below, not able to upload the file:

@malikkal

YanzhaoLi commented 4 years ago

@aviratna @malikkal It seems the size is still not large enough. BTW, Is the rhel-7.7 same as cento-7, or is there only difference between kernel ?

malikkal commented 4 years ago

same as CentOS 7.7 including the kernel version. CentOS 7.7. is built from RHEL 7.7. SRPMs minus branding.

BTW, the build containers that I am using are non-privileged. Should I be building from privileged containers?

YanzhaoLi commented 4 years ago

the

@malikkal I don't think it matters. BTW, I tried out centos-7 and it worked. I'll continue to debug with centos-7.7.

malikkal commented 4 years ago

@YanzhaoLi any updates on this, please?

YanzhaoLi commented 4 years ago

@malikkal Still in investigation and to find the solution might not be a trivial. The size of generated Centos7.7 iso image is 354MB which is too larger than that of centos7, which results in the exhaustion of disk space when booting. I don't know the reason but it might be the poor dependency of management in the rpms. So first we have to figure out the extra and unnecessary packages for centos7.7.

malikkal commented 4 years ago

@YanzhaoLi okay, thanks mate. In parallel, let me also revert to 7.5 or older and see if it helps reducing the size.

malikkal commented 4 years ago

rhel7.4 (bootstrap.iso is 273 MB) 3.10.0-693.21.1.el7.x86_64 #1 SMP Fri Feb 23 18:54:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

rhel7.5 (bootstrap.iso is 310 MB) 3.10.0-862.14.4.el7.x86_64 #1 SMP Fri Sep 21 09:07:21 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux for rhel7.5, the tmpfs for /.tether mount is remaining at only 5%. :( I will proceed with 7.5 for now.

hickeng commented 4 years ago

Looking at this after call last week.

tmpfs for /.tether mount is remaining at only 5%.

The tmpfs is supposed to be very nearly full - it's size it calculated based on the size of the files needed. Ideally we'd be able to hit 100% perfectly so as to use the minimum number of memory pages, but some tolerance is useful.

As @YanzhaoLi noted, 354MB is way more than I'd expect for a bootstrap iso.

This line looks wrong to me - there's zero reason for us to have curl in the bootstrap, nor the key utils. Certainly not as dependencies of rngd which, in most cases these days, should just be calling RDRAND: https://github.com/vmware/vic/blob/master/isos/base/repos/centos-7/init.sh#L31

I think it likely that a mistake was made when determining the library dependencies for rgnd that has bloated this image and any based on it.

The probable reason for the out of space issues is that there are a lot of new libraries introduced by updates for centos7.5 and those libraries are not matched by the list used to calculate the necessary tmpfs size: https://github.com/vmware/vic/blob/1967d0cd68a30ee77752cec0d1de6defa59d79df/isos/bootstrap.sh#L19

This should not be fixed just by adding the new libraries to the list used to calculate the tmpfs size - it should be fixed by determining the actual runtime dependencies used by rngd and iptables for the operations the tether requires.

Additionally there is a double copy of the iptables libraries into /.tether/usr/lib64: https://github.com/vmware/vic/blob/6729b55bb1e6be859006a2807d0a809fc70a8aca/isos/base/repos/centos-7/init.sh#L82 I assume this is because something has a hardcoded library path suffix but this would be better addressed with a symlink from /.tether/usr/lib64 to /.tether/lib64 as is the same set of libraries. Currently this will be doubling the footprint of any library binaries which definitely will not be accounted for in the tmpfs size calculation.