vmware / photon

Minimal Linux container host
https://vmware.github.io/photon
Other
3.03k stars 698 forks source link

Kick start installer. #1391

Open spyroot opened 1 year ago

spyroot commented 1 year ago

Describe the bug

Folks,

Could you please add two features to the installer?

image

a) My understanding, if the package is unresolved, the kick-start installer just crashes. ( check picture)

"additional_packages": [
    "vim",
    "gcc",
    "git",
    "wget",
    "linux-rt",
    "linux-rt-devel",
    "wget",
    "make",
    "meson",
    "build-essential",
    "ninja-build",
    "nasm",
    "lshw"
  ],

The root cause here is the unresolved package name. lshw. Note I put files in noarch and x86, but I think the installer doesn't use the file name to resolve rpm but uses something different.

Note that the installer crashed, but it doesn't give a name of the package, so if you have ten packages, you don't know which one it doesn't like. i.e., in json list, which one caused a crash?
It probably makes sense to wrap that call. with Try. Except. and write to /var/intaller.log

2) Can you provide an example of how to use additional_rpm and explain the logic? In my case, I use meson and a set of other toolchains in post. I.e. I compile a bunch of staff.

3) Can you please provide the option to run something post-install after reboot? i.e., not post-install but post-reboot.

Example let's assume I do kick-start and upgrade the kernel "linux-rt", "linux-rt-devel"

Now in my post, I need to compile, let's say, IAVF or any other driver that uses /usr/src/linux, so I need to link the kernel header to one that the system will use post boot.

You can check the build I do https://github.com/spyroot/photon_rt

Reproduction steps

Example json I generate.

I believe cause package list.

{
  "hostname": "photon-machine",
  "password": {
    "crypted": false,
    "text": "VMware1!"
  },
  "disk": "/dev/sda",
  "partitions": [
    {
      "mountpoint": "/",
      "size": 0,
      "filesystem": "ext4",
      "lvm": {
        "vg_name": "vg1",
        "lv_name": "rootfs"
      }
    },
    {
      "mountpoint": "/root",
      "size": 8192,
      "filesystem": "ext4",
      "lvm": {
        "vg_name": "vg1",
        "lv_name": "root"
      }
    },
    {
      "mountpoint": "/boot",
      "size": 8192,
      "filesystem": "ext4"
    }
  ],
  "packagelist_file": "packages_rt.json",
  "additional_packages": [
    "vim",
    "gcc",
    "git",
    "wget",
    "linux-rt",
    "linux-rt-devel",
    "wget",
    "make",
    "meson",
    "build-essential",
    "ninja-build",
    "nasm",
    "lshw"
  ],
  "postinstallscripts": [
    "post.sh",
    "overwrite.env"
  ],
  "search_path": [
    "/mnt/media",
    "/tmp",
    "/mnt/media/direct_rpms",
    "/mnt/media/git_images",
    "/mnt/media/direct",
    "/"
  ],
  "postinstall": [
    "#!/bin/sh",
    "sed -i 's/PermitRootLogin.*/PermitRootLogin yes/g' /etc/ssh/sshd_config",
    "yum list installed > /installed.before.log",
    "rpm -qa > /rpm.installed.before.log",
    "ls /mnt/* > list.log",
    "systemctl disable --now systemd-timesyncd",
    "sed -i 's/tx_timestamp_timeout.*/tx_timestamp_timeout    100/g' /etc/ptp4l.conf",
    "sed -i 's/eth0/eth4/g' /etc/sysconfig/ptp4l",
    "systemctl enable ptp4l.service phc2sys.service",
    "echo \"v=$(ls /mnt/cdrom/direct_rpms/*.rpm | wc -l); echo \"number of rpms in cdrom $v\"",
    "echo \"v=$(ls /mnt/media/direct_rpms/*.rpm | wc -l); echo \"number of rpms in cdrom $v\"",
    "echo \"Installing rpms from media\"; tdnf install -y /mnt/media/direct_rpms/*.rpm",
    "tdnf \"Installing rpms from cdrom\"; install -y /mnt/cdrom/direct_rpms/*.rpm",
    "tdnf \"Installing rpms from tmp\"; install -y /tmp/direct_rpms/*.rpm",
    "echo \"copy direct_rpms from /mnt/media\"; mkdir -p /direct_rpms; cp /mnt/media/direct_rpms/*.rpm /direct_rpms",
    "echo \"copy direct_rpms from /mnt/cdrom\"; mkdir -p /direct_rpms; cp /mnt/cdrom/direct_rpms/*.rpm /direct_rpms",
    "echo \"copy direct from /mnt/media\"; mkdir -p /direct; cp /mnt/media/direct/* /direct",
    "echo \"copy direct from /mnt/cdrom rpms\"; mkdir -p direct; cp /mnt/cdrom/direct/* /direct",
    "echo \"copy git_images from /mnt/media\"; mkdir -p /git_images; cp /mnt/media/git_images/* /git_images",
    "echo \"copy git_images from /mnt/cdrom\"; mkdir -p /git_images; cp /mnt/cdrom/git_images/* /git_images",
    "tdnf install dmidecode lshw -y",
    "tdnf update -y",
    "tdnf upgrade -y",
    "yum -y update >> /etc/postinstall",
    "yum -y install gcc meson git wget numactl make curl nasm >> /etc/postinstall",
    "yum -y install python3-pip unzip zip gzip build-essential zlib-devel >> /etc/postinstall",
    "yum -y install lshw findutils vim-extra elfutils-devel cmake cython3 python3-docutils >> /etc/postinstall",
    "yum -y install libbpf-devel libbpf libpcap-devel libpcap libmlx5 libhugetlbfs  >> /etc/postinstall"
  ],
  "linux_flavor": "linux-rt",
  "photon_docker_image": "photon:3.0",
  "network": {
    "type": "dhcp"
  }
}

Expected behavior

a) Expected behavior won't crash; Expected behavior writes to a log so you can see exact package it failed to detect with some msg that explains why.

b) It makes sense to explain if you need additional rpm like lshw or any other form noarch or x86 where to put them and how to register them, so the installer resolves.

And for a post, I think it makes sense to have post_install post_first_reboot because if you swap kernel as I explained, if you post installer driver and require actually /usr/src/header

Right now, I parse grub and trying to figure out what is future -> kernel on the next boot, but that is not right.

Additional context

No response

sshedi commented 1 year ago

Refer: https://vmware.github.io/photon/docs/user-guide/working-with-kickstart/

spyroot commented 1 year ago

Please check what I said. Installer crash if you include additional_packages. That call needs to be Try except call check isoinstaller source code ...

Where in the link you've sent an example about additional_rpms_path ? ( installer in /mnt/media, so if I have on same CDROM foo). i.e /mnt/media/foo and I add additional_rpms_path. a package that is in additional_packages and not RPMS still doesn't resolve.)

Nothing personal : ) but I don't think you've read what I posted

spyroot commented 1 year ago

I found where exactly in the code. I probably will introduce custom exceptions and raise and catch and show errors without crashing because it is not a critical error. it is sufficient to serialize /var/log/installer.error.log, i.e.; if the package list contains "meson" and is not found, there is no reason to crash the installer. Regarding additional path rpm, I am still trying to understand the logic because what I understand from the logic is that rpm needs to be in the list of XML. There is an extra check, so it is not just path to a file.

dcasota commented 1 year ago

For a later secureboot bootmode, adding *.rpms doesn’t give security hence “crashing” the installer imho isn’t that wrong. A comprehensive error message would be nice though.

spyroot commented 1 year ago

I'm referring to a unnessasery crash, adding RPM to the list. This will crash the installer; IMHO makes more sense to give a warning and move on. It is optional and outside of packagelist_file.json.

"additional_packages": [ "meson" ],

dcasota commented 1 year ago

meson specs comes with following specs BuildRequires: gcc BuildRequires: python3-devel BuildRequires: python3-setuptools BuildRequires: ninja-build BuildRequires: gtest-devel BuildRequires: gmock-devel BuildRequires: gettext

Requires: ninja-build Requires: python3 Requires: python3-setuptools

Just curious, less to do with the logfile content + recipe suggested, what happens with setup success adding python3 and python3-setuptools with a recursive package lookup into their “requires” entries?

spyroot commented 1 year ago

I may be ignorant in that regard, but I added RPM inside RPM dir and insider noarch since it is not arch. But what I don't know is what is related to packagelist_file json. In essence, if I just add copy rpm to RPM dir inside ISO, is it sufficient to step and installer resolve a file name based on the name? In essence, what is the correct step to add rpm to ISO?

Step I did.

I tried to copy it to rpm dir (inside ISO). Didn't work Copy to noarch dir. didn't work. ( it noarch rpm )

I also added a separate dir in ISO and added that dir to additional_rpm JSON, but it didn't work either. Additional_rpm

[
"/mnt/media/my_dir",
"/cdrom/media/my_dir"
// here, I basically added a bunch of dir just in case.
]

Already in the build, since I'm using packages_rt.json python3 python3-setuptools

If you could share what is the correct procedure, I'll repeat the experiment : ) My initial assumption was that Installer does sub-string match. So it just gets the list of files and finds a "meson" that is sufficient but since it is a bit undocumented I might be on the wrong path here.

dcasota commented 1 year ago

Which meson package version do you use? https://github.com/vmware/photon/tree/5.0/SPECS/meson should work but hasn’t been backported yet to ph3. You could try the ph5beta. Is python3-setuptools preinstalled as well ? haven’t found it in https://github.com/vmware/photon/blob/3.0/common/data/packages_rt.json.

spyroot commented 1 year ago

It is 4.0.2, and RT ISO is smaller, but all python and other toolchains are there, and if you check, meson is packaged as a general package in full ISO and in the package tree. ( note I give one example meson). I tried lshw and a couple of other tools. None of them actually work, so there is an extra check somewhere.

I can also install it after the first reboot.

If you mount that, you will see all python inside noarch. If I put meson from 4.0.2 same version that installs post boot, but it never works if I just put rpm inside ISO.

For the sake of sanity, I re-check. This latest 4.0 rev 2 ISO x86_64 Real-Time flavour

Inside, no arch dir, all python 3.10-packages pip/setup, etc.

image

Also, notice this one does work. "yum -y install gcc meson git wget numactl make curl nasm >> /etc/postinstall",

My guess.

Installer actually never reads stderr. Line 1022 https://github.com/vmware/photon-os-installer/blob/master/photon_installer/installer.py

That basically explains why I see not found an issue,

I think it makes sense if ret is not 0; read from stderr.

if "No such file" in err_message:
      if package_option:
            pass;  ok, case
      if pacakge_mandatory:
            raise NotFound()
# My guess tdnf returns as an exit code or something else as an exit code.
#  0 : succeed; 137 : package already installed; 65 : package not found in repo.
 else:
            stdout, stderr = process.communicate()
            self.logger.info(stdout.decode())
            retval = process.returncode
            # image creation. host's tdnf might not be available or can be outdated (Photon 1.0)
            # retry with docker container
            if retval != 0 and retval != 137:
                self.logger.error(stderr.decode())
                stderr = None
                self.logger.info("Retry 'tdnf install' using docker image")
                retval = self._run_tdnf_in_docker(tdnf_cmd)

As far _as install_additional_rpms. I checked the code based on this logic. If I put meson in dir my_rpms and set additional_rpms_path, it should find it. Hence, I think either os.path.exists(rpms_path): doesn't return something, or cmd.run returns an error.

I don't see which one, or it blocked somewhere before even hitting _install_additional_rpms

Now, what RPM do I actually put inside either rpm dir or my_dir inside?
Answer: I took rpm -qa exact same RPM, exact same dependency, that rpm depends on.

  def _install_additional_rpms(self):
        rpms_path = self.install_config.get('additional_rpms_path', None)

        if not rpms_path or not os.path.exists(rpms_path):
            return

        if self.cmd.run(['rpm', '--root', self.photon_root, '-U', rpms_path + '/*.rpm']) != 0:
            self.logger.info('Failed to install additional_rpms from ' + rpms_path)
            self.exit_gracefully()
dcasota commented 1 year ago

Unfortunately, I didn't learn yet how to write good pull requests and when I did, it was because a doc syntax was wrong spelled and sometimes this correction was wrong too. Let's see how the Photon OS team solves it. Here the findings so far in my own words: a) Avoid kick-start installer crash + ensure specs coding and backporting guidelines + run test completeness indicators 2) gitops use case explanation + context driven doc description about usage of additional_packages 3) solve new issues