sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
737 stars 1.43k forks source link

RFS cache feature leads to /lib/modules folder to be empty. #16944

Open liushilongbuaa opened 1 year ago

liushilongbuaa commented 1 year ago

Description

Without this fix PR https://github.com/sonic-net/sonic-buildimage/pull/16936, build will exit with depmod error. It says missing folder /lib/modules.

Steps to reproduce the issue:

1. 2. 3.

Describe the results you received:

Describe the results you expected:

Output of show version:

(paste your output here)

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

k-v1 commented 1 year ago

After additional tests I have found more possible issues for SPLIT_RFS+DPKG cache:

  1. Need to update different files at 2nd stage like /etc/resolv.conf, /etc/apt/sources.list, apt proxy ...
  2. Need to fix files and flags for RFS DPKG cache (at least add _DEPENDS for initramfs, kernel and deboostrap packages).
  3. Need to move some actions like copying config files and creating groups and users from 1st stage and 2nd. Otherwise we need to track all this variables and files as RFS DPKG cache dependencies.
  4. Need to add version files for default and host-image to RFS_DEP_FILES.
  5. Need to track changes in sonic-build-hooks and probably trusted gpg keys (NOTE: we install them with sonic-build-hooks deb package, line in build_debian.sh for gpg key does nothing). ... Also I think it's better to create a separate file for 1st stage: build_rfs.sh or something like this.

Until that fixed I don't recommend to use DPKG cache for SPLIT RFS. Maybe I'll create PR later, but not sure. Need a lot of time to test all possible cases.

k-v1 commented 1 year ago

@Yakiv-Huryk FYI

liushilongbuaa commented 1 year ago

Maybe we need to collect some data. We need to put only one or two long time job into stage 1. Small jobs which run less than 1 second are not included.

dgsudharsan commented 1 year ago

@xumia Please triage this issue. If you require Nvidia's help please reach out to @Yakiv-Huryk

k-v1 commented 1 year ago

Maybe we need to collect some data. We need to put only one or two long time job into stage 1. Small jobs which run less than 1 second are not included.

==== 1st stage ====

  1. mount /proc in sonic-slave, create base directories for rootfs
  2. debootstrap (build_debian_base_system.sh)
  3. install sonic-build-hooks (prepare_debian_image_buildinfo.sh)
  4. setup hosts, fstab, apt, apt proxy, mount /proc
  5. install eatmydata
  6. install base deb package for the next steps
  7. makedev
  8. install initramfs and kernel ?. Sign the Linux kernel (maybe move this step to the 2nd stage?)
  9. install docker ?. Install kubernetis (maybe move this step to the 2nd stage?)
  10. install more deb packages from public debian mirror
  11. install dev deb packages to build python packages (move installation of packages like build-essentials from sonic-debian-extension file here)
  12. install python packages from pypi
  13. clean up
  14. create squashfs

=== 2nd stage ====

  1. base setup
  2. unsquash rootfs from 1st stage
  3. update some files like resolv.conf, apt http proxy, sources.list ?
  4. create users and groups
  5. copy config files for initramfs, dhcp, ssh, docker, etc
  6. .....
  7. collect versions info
  8. remove dev deb packages installed at 1st stage
  9. ....

Also need to fix list of dependencies (flags and files) for 1st stage RFS DPKG cache. Add _DEPENDS for RFS ($(DEBOOTSTRAP) $(INITRAMFS_TOOLS) $(INITRAMFS_TOOLS_CORE) $(LINUX_KERNEL)) for DPKG cache. Probably build sonic-build-hooks as reproducible deb package (and add deb file to RFS DPKG-cache deps files). This package also includes some config options like VERSION_CONTROL_COMPONENTS and downloaded trusted gpg keys. If checksum of sonic-build-hooks is changed then we should rebuild 1st stage RFS.

I think I can try to implement this and open PR later or report if something is not possible to implement.