systemd / mkosi

💽 Build Bespoke OS Images
https://mkosi.systemd.io/
1.16k stars 315 forks source link

Gentoo discussion #1661

Closed DaanDeMeyer closed 1 year ago

DaanDeMeyer commented 1 year ago

Now that we have --tools-tree, we shouldn't have to unconditionally download the stage3 snapshot to build a gentoo image. When building from a gentoo host system, we should just be able to use the host's emerge.

To make this work, we have to make sure neither the host nor the tools tree are modified by portage. This means we should not do SYSROOT=/ but instead should do SYSROOT=ROOT. Yes this means that build dependencies are installed into the image, but we can just uninstall them again later (not ideal but it will do the trick).

We will also have to make sure portage's config files are copied into the final image since portage will only look for them there, which we'd prefer not to do, but in this case there's no other way.

This would allow us to build gentoo images directly on a gentoo host system with portage (and maybe crossdev). On non-gentoo systems, either emerge has to be packaged by the distribution, or a stage3 snapshot has to be downloaded and unpacked to be used via --tools-tree which can then be used to build gentoo images.

This was inspired by https://github.com/chewi/cross-boss, which we can also use for inspiration on cross compilation

cc @257

257 commented 1 year ago

Now that we have --tools-tree, we shouldn't have to unconditionally download the stage3 snapshot to build a gentoo image.

why not? stage3 is our --tools-tree no?

When building from a gentoo host system, we should just be able to use the host's emerge.

yes we can but we should not. letting people with a different distro to build gentoo is one the the appeals of mkosi. for instance, poeple on fedora who want a small image for a little embedded board. they don't have to run gentoo to build gentoo.

To make this work, we have to make sure neither the host nor the tools tree are modified by portage. This means we should not do SYSROOT=/ but instead should do SYSROOT=ROOT. Yes this means that build dependencies are installed into the image, but we can just uninstall them again later (not ideal but it will do the trick).

gentoo.py, atm, doens't install bdeps and makes sure rdeps are installed as needed. it even updates the bdeps if need be. what's the need for the above complexity.

We will also have to make sure portage's config files are copied into the final image since portage will only look for them there, which we'd prefer not to do, but in this case there's no other way.

This would allow us to build gentoo images directly on a gentoo host system with portage (and maybe crossdev). On non-gentoo systems, either emerge has to be packaged by the distribution,

complexity of crossdev is impossible to handle in mkosi. it's probably the main reason Yokto exists.

that, i would think, will never happen. tool chain is ingraned into portage. stage3 is "packaged-portage" if you like.

or a stage3 snapshot has to be downloaded and unpacked to be used via --tools-tree which can then be used to build gentoo images.

so we would basically be doing all this for gentoo-on-gentoo. i.e. move the current logic into the case were:

  1. all non-gentoo hosts
  2. portage is not installed or is installed but it is not useable (again all non-gentoo hosts)

not that we have tried this in the past. portage will not work without a "tied-in" toolchain. that is exactly what crossdev does btw, i.e. builds a toolchain+portage specific for a target triple/machine.

This was inspired by https://github.com/chewi/cross-boss, which we can also use for inspiration on cross compilation

i have no idea about cross-boss but looks hackish. plus do we have the backing of upstream going down this path? upstream, at least from my discussion on #gentoo-llvm, seem to be very much in favour of us switching to llvm. we need upstream's blessing, at least to some extend one would think.

DaanDeMeyer commented 1 year ago

why not? stage3 is our --tools-tree no?

yes we can but we should not. letting people with a different distro to build gentoo is one the the appeals of mkosi. for instance, poeple on fedora who want a small image for a little embedded board. they don't have to run gentoo to build gentoo.

--tools-tree= is about being able to use another tree as the source for build tooling if the host system does not have all the tools required by mkosi. If the host has all the tooling required installed, using --tools-tree= should not be required. Currently, with gentoo, we're basically forcing people to use --tools-tree= via the stage3 snapshot. What I'm proposing is that if the host system is gentoo, users don't need to download the stage3 snapshot and can simply use the tooling from their host system. Of course, if they want to, using the stage3 snapshot should still be an option (by extracting it to mkosi.tools).

So I'm not saying that everyone should run gentoo to build gentoo images. I'm saying that whoever is not on a gentoo system should use --tools-tree= to be able to build gentoo images by extracting a stage3 snapshot and using that as --tools-tree=.

gentoo.py, atm, doens't install bdeps and makes sure rdeps are installed as needed. it even updates the bdeps if need be. what's the need for the above complexity.

When --tools-tree= is used, /usr from it is mounted read-only (by design) to ensure it isn't modified between builds. This won't work with the current approach of gentoo.py as it depends on being able to install build dependencies in the --tools-tree=. To fix this, we temporarily install build dependencies to ROOT instead and remove them when we're done. This allows the tools tree to stay read-only. This should be a minimal amount of complexity, the only thing we need is one more call to emerge after installing packages to remove any build dependencies that haven't been installed explicitly.

i have no idea about cross-boss but looks hackish. plus do we have the backing of upstream going down this path? upstream, at least from my discussion on #gentoo-llvm, seem to be very much in favour of us switching to llvm. we need upstream's blessing, at least to some extend one would think.

I'm all in favor of using llvm instead of crossdev if that makes things easier. I don't particularly care which tools are used to do the cross compilation, only that gentoo integrates with --tools-tree= in the same way that the other distributions do. The only real difference between gentoo and other distributions in mkosi should be that you're more or less forced to use --tools-tree= when building gentoo images on host systems that are not gentoo because no other distro packages portage + toolchain. Aside from that, once you extract a stage3 snapshot into mkosi.tools, it should behave exactly the same as the other distributions, at least that's what I'm going for.

257 commented 1 year ago

sounds good to me so long as we're keeping the current logic of fetching/extracting stage3. so i suppose we want to:

right?

independent of this i'm also experimenting with cross-compiling with llvm, ncurses builds with no probelm (that's always a problem in cross-compiling) but glibc is no go, looking into it.

DaanDeMeyer commented 1 year ago

https://github.com/systemd/mkosi/pull/1667 is a first step in this direction

sounds good to me so long as we're keeping the current logic of fetching/extracting stage3.

What I was thinking of is that if we're not on gentoo and no tools tree is provided, we automatically download a stage 3. If we're on gentoo, we use host portage.

right?

Yeah that looks right

independent of this i'm also experimenting with cross-compiling with llvm, ncurses builds with no probelm (that's always a problem in cross-compiling) but glibc is no go, looking into it.

Yeah if we can get cross compilation with llvm working that would be great.

257 commented 1 year ago

Yeah if we can get cross compilation with llvm working that would be great.

there is SuperBug here: glibc won't build with clang; systemd won't use existing alt libc's, all miss NSS etc. until and unless that's sorted we have kiss cross-complilation good bye. i'm reading this atm though.

257 commented 1 year ago

Yeah if we can get cross compilation with llvm working that would be great.

there is SuperBug here: glibc won't build with clang; systemd won't use existing alt libc's, all miss NSS etc. until and unless that's sorted we have kiss cross-complilation good bye. i'm reading this atm though.

not officially supported but apparently it's close

257 commented 1 year ago

not officially supported but apparently it's close

;tldr, We are planning to have build support for the next GLIBC 2.39 release mid-year

DaanDeMeyer commented 1 year ago

@257 Could you do a patch to have mkosi generate its own minimal /etc instead of using the one from the stage 3 snapshot? It should only do what's necessary to be able to build a default image (so only with baselayout installed). Then we copy state.pkgmngr into that minimal /etc and mount it over /etc when invoking emerge.

The repository config we generate in the minimal /etc should be configured to write directly to `state.cache / "repos"' so we don't have to mount that to :var/db/repos anymore.

With this in place, we can then only use /usr from the stage3 snapshot and ignore the rest.

257 commented 1 year ago

portage would at least need the following besides /etc:

       /var/cache/edb/
              misc internal cache files

       /var/db/pkg/
              database to track installed packages
257 commented 1 year ago

portage installation comes with some defaults:

       /usr/share/portage/config/
              make.globals
              repos.conf
              sets

see man 5 portage for more.

so we might be able to get away with only setting make.profile under /etc. but then we're all the way back to the early days when i proposed a config item for this. would you accept distro-specific flags for makosi?

DaanDeMeyer commented 1 year ago

/var/db/pkg can be configured using PKGDIR and we might be able to use PORTAGE_DEPCACHEDIR for /var/cache/edb

DaanDeMeyer commented 1 year ago

so we might be able to get away with only setting make.profile under /etc. but then we're all the way back to the early days when i proposed a config item for this. would you accept distro-specific flags for makosi?

We generate a minimal config ourselves and all other distro specific package manager configuration can be provided using --package-manager-tree. I'm not sure why we wuld need a distro specific option here?

257 commented 1 year ago

distro-flag is for setting the profile, instead of hard-coding, like we do know: make.profile -> ../../var/db/repos/l/profiles/no-multilib/systemd/merged-usr

lots of defaults are set in that profile. for amd64, for example, gentoo provides profiles:

  [1]   default/linux/amd64/17.1 (stable)
  [2]   default/linux/amd64/17.1/selinux (stable)
  [3]   default/linux/amd64/17.1/hardened (stable)
  [4]   default/linux/amd64/17.1/hardened/selinux (stable)
  [5]   default/linux/amd64/17.1/desktop (stable)
  [6]   default/linux/amd64/17.1/desktop/gnome (stable)
  [7]   default/linux/amd64/17.1/desktop/gnome/systemd (stable)
  [8]   default/linux/amd64/17.1/desktop/gnome/systemd/merged-usr (stable)
  [9]   default/linux/amd64/17.1/desktop/plasma (stable)
  [10]  default/linux/amd64/17.1/desktop/plasma/systemd (stable)
  [11]  default/linux/amd64/17.1/desktop/plasma/systemd/merged-usr (stable)
  [12]  default/linux/amd64/17.1/desktop/systemd (stable)
  [13]  default/linux/amd64/17.1/desktop/systemd/merged-usr (stable)
  [14]  default/linux/amd64/17.1/developer (exp)
  [15]  default/linux/amd64/17.1/no-multilib (stable)
  [16]  default/linux/amd64/17.1/no-multilib/hardened (stable)
  [17]  default/linux/amd64/17.1/no-multilib/hardened/selinux (stable)
  [18]  default/linux/amd64/17.1/no-multilib/systemd (dev)
  [19]  default/linux/amd64/17.1/no-multilib/systemd/merged-usr (dev)
  [20]  default/linux/amd64/17.1/no-multilib/systemd/selinux (exp)
  [21]  default/linux/amd64/17.1/no-multilib/systemd/selinux/merged-usr (exp)
  [22]  default/linux/amd64/17.1/systemd (stable)
  [23]  default/linux/amd64/17.1/systemd/merged-usr (stable)
  [24]  default/linux/amd64/17.1/systemd/selinux (exp)
  [25]  default/linux/amd64/17.1/systemd/selinux/merged-usr (exp)
  [26]  default/linux/amd64/17.1/clang (exp)
  [27]  default/linux/amd64/17.1/systemd/clang (exp)
  [28]  default/linux/amd64/17.1/systemd/clang/merged-usr (exp)
  [29]  default/linux/amd64/17.0/x32 (dev)
  [30]  default/linux/amd64/17.0/x32/systemd (exp)
  [31]  default/linux/amd64/17.0/x32/systemd/merged-usr (exp)
  [32]  default/linux/amd64/17.0/musl (dev)
  [33]  default/linux/amd64/17.0/musl/clang (exp)
  [34]  default/linux/amd64/17.0/musl/hardened (exp)
  [35]  default/linux/amd64/17.0/musl/hardened/selinux (exp)
  [36]  icinga:default/linux/amd64/17.1/icinga (stable)
  [37]  icinga:default/linux/amd64/17.1/no-multilib/icinga (stable)

same for arm, ppc, risc-v etc. some profiles are exotic, musl for one is out of question for systemd unless someone writes a compat layer with glibc.

this is something gentoo users usually need to pick, instead of being forced to a particular profile.

DaanDeMeyer commented 1 year ago

So users should provide the profile to use by adding a make.profile symlink to the package manager tree. If no symlink is provided, we pick a default one and use that.

257 commented 1 year ago

and that actually determines which stage3 we fetch.

257 commented 1 year ago

So users should provide the profile to use by adding a make.profile symlink to the package manager tree. If no symlink is provided, we pick a default one and use that.

wait, this won't be a good idea. things under pkgmngr are not included in final image. skeleton is where this should go, what do you think?

DaanDeMeyer commented 1 year ago

wait, this won't be a good idea. things under pkgmngr are not included in final image. skeleton is where this should go, what do you think?

It depends on whether you want the final image to be a "golden" image or not. If you want to be able to run emerge in the final image after building it, then indeed it should go into mkosi.skeleton/. But state.pkgmngr automatically defaults to mkosi.skeleton if --package-manager-tree is not set explicitly, so users can use either one and it should just work.

257 commented 1 year ago

so websync then should be run before any of this, that would let us to validate the profile (remember that symlink would be a dangling on unless repo/gentoo is populated), before attempting anything; fetching stage3 for instance.

DaanDeMeyer commented 1 year ago

You already have to download latest-stage3.txt anyway to figure out the latest stage 3 snapshot, can't we use the information in that to figure out valid profiles as well?

257 commented 1 year ago

yes, that would be parsing lines like this:

20230716T164653Z/stage3-amd64-systemd-mergedusr-20230716T164653Z.tar.xz

lighter than websync and then looking under repo/gentoo/profile

on the other hand the actual source of truth is in the repo.

DaanDeMeyer commented 1 year ago

Does it matter which stage 3 snapshot we use at all? We're emerging everything from scratch anyway so unless the compilers are configurated differently it shouldn't matter too much?

257 commented 1 year ago

Does it matter which stage 3 snapshot we use at all?

very much so. i wouldn't mix profiles.

257 commented 1 year ago

Does it matter which stage 3 snapshot we use at all? We're emerging everything from scratch anyway so unless the compilers are configurated differently it shouldn't matter too much?

not just the compiler, the whole toolchain.

DaanDeMeyer commented 1 year ago

Does it matter which stage 3 snapshot we use at all? We're emerging everything from scratch anyway so unless the compilers are configurated differently it shouldn't matter too much?

not just the compiler, the whole toolchain.

Well yeah, but a lot of the profiles seem to be about kernel, desktop environment, init system and such, does the actual toolchain differ between profiles?

257 commented 1 year ago

yes, libc besides compiler

DaanDeMeyer commented 1 year ago

yes, libc besides compiler

How does the libc of the stage 3 snapshot affect the image that we'll be producing? Especially if we're going to cross compiling, portage should compile libc from scratch for the root we're building no (but I would think this would be done even if we're not cross compiling)?

257 commented 1 year ago

regardless of arch (cross-compiling), would we want to deal with building 'gcc+glibc' using profile 'musl-llvm'? i would say no. we want to get the closest profile to what we want to build.

257 commented 1 year ago

although many verbs in mkosi require systemd in the final image and that would narrow it down to 'compiler+glibc', unless, someone writes a compat layer for say musl so systemd can be built against it.

DaanDeMeyer commented 1 year ago

regardless of arch (cross-compiling), would we want to deal with building 'gcc+glibc' using profile 'musl-llvm'? i would say no. we want to get the closest profile to what we want to build.

What actually happens if one tries to do this? Is it first going to build gcc with llvm and then use gcc to build the rest of the profile?

257 commented 1 year ago

no, it will always use what's in stage3 to build everything. think of stage3 as gentoo's SDK if you like.

DaanDeMeyer commented 1 year ago

no, it will always use what's in stage3 to build everything. think of stage3 as gentoo's SDK if you like.

So we can't use one SDK to build every profile? I would prefer if we have a default stage 3 that we use to build things, and if users don't like that default stage 3, they can use --tools-tree= to provide a different stage 3.

257 commented 1 year ago

So we can't use one SDK to build every profile? I would prefer if we have a default stage 3 that we use to build things,

that's what we do now.

and if users don't like that default stage 3, they can use --tools-tree= to provide a different stage 3.

said flag would automate this bit, we already have the logic to fetch the stage3, why not give it to them.

DaanDeMeyer commented 1 year ago

I just noticed the base profile installs a lot of packages by default. That goes against mkosi's policy of not installing any packages (except baselayout) by default, so we're going to have to either override that or generate our own profile.

EDIT: Seems we can do this with /etc/portage/profile/packages, so we should just write an empty file there (or just baselayout).

257 commented 1 year ago

look carefully where it installs though. SYSROOT or ROOT. removing useless flags we're passing to emerge should reduce the updates on SYSROOT.

DaanDeMeyer commented 1 year ago

I'm confident it's in ROOT as gcc gets installed in the image we build for CI and it's listed in the base packages.

257 commented 1 year ago

binrepo is populated by flag profile, that could be the reason. on the other hand I think it would it swell for us to have our own profile, with a minial @system set.

257 commented 1 year ago

let's run in it by upstream first though.

DaanDeMeyer commented 1 year ago

@257 So I've spent a lot of time playing around with the gentoo implementation and I don't think this is ever going to work smoothly. The main problem is that to build gentoo images from source, the host system needs to closely match the image we're building so we'll always be forced to download a stage 3 tar and use everything from it.

What I'm not starting to lean towards is that the gentoo support in mkosi needs to be binary package based so that we don't depend on the host system that's used to build the image. What this means is that the binary packages are always built somewhere else, put into a binary package repository which is then supplied to mkosi via the Mirror= option. We then invoke emerge with --usepkgonly=y --getbinpkg=y to force use of binary packages only, which means the entire need for a toolchain goes away. With this approach, it might actually be possible to use portage installed on the host system to build the image without needing a stage 3 tarball.

I realize this might not be what you want from the gentoo support in mkosi but I don't see us ever having a good gentoo implementation that matches how the other distros work otherwise.

I started playing around with an implementation of this using your binpkgs repo but it seems there's a binpkg missing from the repo:

emerge: there are no binary packages to satisfy "sys-apps/locale-gen".
(dependency required by "sys-libs/glibc-2.37-r3::gentoo" [binary])
(dependency required by "dev-libs/libgcrypt-1.10.1-r3::gentoo" [binary])
(dependency required by "sys-apps/systemd-9999::gentoo" [binary])
(dependency required by "sys-apps/dbus-1.15.6::gentoo" [binary])

Any chance you could add a locale-gen binary package to the binpkgs repo?

257 commented 1 year ago

any binrepo will eventually drift off. it seems to me that we're sacrificing the actual support for the sake of CI. maybe we can disable Gentoo in tests and only enable it for cases where either Gentoo module is directly touched or whenever we suspect the change will have an impact on Gentoo.

I might have more to say about this tomorrow.

257 commented 1 year ago

@257 So I've spent a lot of time playing around with the gentoo implementation and I don't think this is ever going to work smoothly. The main problem is that to build gentoo images from source, the host system needs to closely match the image we're building so we'll always be forced to download a stage 3 tar and use everything from it.

this is true for now. once clang can build glibc, or if ever musl has a gnu compat layer written for it, we're home free. an llvm based stage3 that can run the host machine can build pretty much everything.

What I'm not starting to lean towards is that the gentoo support in mkosi needs to be binary package based so that we don't depend on the host system that's used to build the image. What this means is that the binary packages are always built somewhere else, put into a binary package repository which is then supplied to mkosi via the Mirror= option.

mkosi has been doing that for me since day one. it simply works!

btw, once we start cross-compiling (with llvm), the initial binary creation becomes ever more crucial. nobody out there is maintaining binrepos for all different combinations of toolchains. we shouldn't either. but to give developers a tool to easily generate their own is what's appealing about mkosi.

We then invoke emerge with --usepkgonly=y --getbinpkg=y to force use of binary packages only, which means the entire need for a toolchain goes away. With this approach, it might actually be possible to use portage installed on the host system to build the image without needing a stage 3 tarball.

nobody knows, simply because portage was never intended to be used like that, primarily speaking. that's going to be yet another territory to explore which is costly, what's the benefit here?

in any case, seems to me that we're avoiding the problem rather than solving it. and in fact, what is the problem actually? code-complexity of fetching a stage3? it's verification? CI's taking too long? maintenance overhead? i honestly don't know so i'm asking.

I realize this might not be what you want from the gentoo support in mkosi but I don't see us ever having a good gentoo implementation that matches how the other distros work otherwise.

i never thought that's the goal here; rpm-based and apt-based distro's resemble each other in that respect, source-based distro's, gentoo being one of them, don't. maybe there is no elegant solution here.

I started playing around with an implementation of this using your binpkgs repo but it seems there's a binpkg missing from the repo:

emerge: there are no binary packages to satisfy "sys-apps/locale-gen".
(dependency required by "sys-libs/glibc-2.37-r3::gentoo" [binary])
(dependency required by "dev-libs/libgcrypt-1.10.1-r3::gentoo" [binary])
(dependency required by "sys-apps/systemd-9999::gentoo" [binary])
(dependency required by "sys-apps/dbus-1.15.6::gentoo" [binary])

Any chance you could add a locale-gen binary package to the binpkgs repo?

yes, not a problem, but like i mentioned any repo will drift away, either deps change (less common) or they actually become outdated. this deps-hell will never stop, we might as well learn to live with it :)

DaanDeMeyer commented 1 year ago

this is true for now. once clang can build glibc, or if ever musl has a gnu compat layer written for it, we're home free. an llvm based stage3 that can run the host machine can build pretty much everything.

stage3 in that sentence is the problem. I don't want to download arbitrary gentoo stages in mkosi to build the system from. I want to build the system from either the actual host system or the tree specified in --tools-tree. As soon as portage can be installed as a package on fedora and bootstrap a gentoo system from fedora with minimal complexity involved, we can look into supporting source base builds for gentoo again.

yes, not a problem, but like i mentioned any repo will drift away, either deps change (less common) or they actually become outdated. this deps-hell will never stop, we might as well learn to live with it :)

This is only if you actually use the ebuild repositories. If --usepkgonly is enabled, portage is perfectly capable of only using information from the binpkgs host so this is not actually an issue. I've managed to get mkosi building using only binpkgs using the information from https://bugs.gentoo.org/470006. The issue I'm now running into is that sys-apps/baselayout in your binhost is built with +split-usr so systemd refuses to install.

Because we don't care about a sysroot or toolchain when using only binary packages, this allows us to run portage straight from the tools tree without needing to download the stage 3 tarball and having to worry about toolchain issues. The binpkgs repository will simply have to be built from a host running gentoo until portage can bootstrap itself from arbitrary distros with a toolchain installed without needing a stage3 tarball.

DaanDeMeyer commented 1 year ago

Also the missing sys-apps/locale-gen is not because of repo drift. It's an IDEPEND of glibc so it needs to be in the binpkg repo as well.

DaanDeMeyer commented 1 year ago

Implemented binary package only support in https://github.com/systemd/mkosi/pull/1699. Still need to get rid of the snapshot though.

257 commented 1 year ago

Also the missing sys-apps/locale-gen is not because of repo drift. It's an IDEPEND of glibc so it needs to be in the binpkg repo as well.

it was not required when i generated the repo. so yes it is a drift already. things change upstream and repo needs to be updated; that's what i mean by the drift and local-gen in this case is a good example of that.

257 commented 1 year ago

this is true for now. once clang can build glibc, or if ever musl has a gnu compat layer written for it, we're home free. an llvm based stage3 that can run the host machine can build pretty much everything.

stage3 in that sentence is the problem. I don't want to download arbitrary gentoo stages in mkosi to build the system from. I want to build the system from either the actual host system or the tree specified in --tools-tree. As soon as portage can be installed as a package on fedora and bootstrap a gentoo system from fedora with minimal complexity involved, we can look into supporting source base builds for gentoo again.

that "arbitrarily" stage3 is the latest official from upstream. and again once we switch to llvm base stage3 we don't need to change stage3 at all, unless, as discussed earlier, user provides a tool-tree.

257 commented 1 year ago

stage3 in that sentence is the problem. I don't want to download arbitrary gentoo stages in mkosi to build the system from. I want to build the system from either the actual host system or the tree specified in --tools-tree. As soon as portage can be installed as a package on fedora and bootstrap a gentoo system from fedora with minimal complexity involved

how many fedora users build gentoo images? maybe we can disable gentoo image on everything other distro and just use gentoo-on-gentoo?

DaanDeMeyer commented 1 year ago

Let's close this one as I'm not planning to work on the gentoo support anymore