rpm-software-management / rpm

The RPM package manager
http://rpm.org
Other
511 stars 371 forks source link

RFE: Automatic (sub)package generators #329

Open Conan-Kudo opened 7 years ago

Conan-Kudo commented 7 years ago

Today in RPM, we have dependency generators that allow us to automatically match dependencies based on the content of the install tree. However, it's still a lot of work for people to split out things into subpackages so that the dependencies are matched up with the components they actually belong to.

It'd be awesome if we had a system for package generators similar to the dependency generator system.

As a conceptual inspiration, Solus has an interesting pattern-matching mechanism in their ypkg system, which is very similar to a YAML-based RPM spec (and somewhat inspired by rpm spec in some ways), but offers some interesting advantages that drastically simplify the effort to package software.

We already have a pattern matching mechanism for dependency generation with the fileattr system, and we sort of have a automatic package generation mechanism for the debuginfo subpackage stuff. I imagine these two mechanisms can be extended to add the ability to set up rules for generating subpackages arbitrarily for various things (Perl & Python binding packages, library packages, devel packages, etc.).

ikeydoherty commented 7 years ago

In ypkg we have a priority based pattern system, which will also happily accept absolute paths. Internally we define subpackages by way of distribution policy, which as well as automating much of the packaging process, ensures distribution policy is respect for file and subpackage placement.

We place dbginfo packages at a significantly higher priority than stock built-in patterns, and the package.yml priority is considered the ultimate priority, enabling the package maintainer to override default considerations (i.e. soname patterns where the upstream has abused .so versioning)

Internally this is handled through a PackageGenerator concept and is initially modeled around the Clear Linux autospec project, and in combination with our dependency system ensures subpackages have natural relationships between one-another and the parent package.

At this point you're already operating on the notion of generators - with either patterns or cheap absolute paths. This allows a global file table which ensures that files do not conflict within a package set, something seen quite commonly with .src.rpm inter-file-conflicts. A paths position is mutually exclusive within the set. At this point it becomes trivial to extend these generators with support for selinux policies, or indeed, simple .in substitution (patterns could conceivably be extended to support %libdir% style macros and expanded at runtime.)

nim-nim commented 6 years ago

The problem with automated (sub)package generators is that

  1. you have upstreams from hell like TeXLive that bundle many many unrelated parts of the same kind. So you need some way to tell rpm "autocreate subpackages but split them around those fileset lines"
  2. you still need to declare package summary and description somewhere (not all descriptions can be autocomputed)
Conan-Kudo commented 6 years ago
  1. you have upstreams from hell like TeXLive that bundle many many unrelated parts of the same kind. So you need some way to tell rpm "autocreate subpackages but split them around those fileset lines"

Sure. And even in Solus' packaging mechanism, the user can override automatic pattern matching to do their own patterns.

  1. you still need to declare package summary and description somewhere (not all descriptions can be autocomputed)

Absolutely, and they can be manually specified as needed. That being said, in a large number of cases, it isn't necessary to do so.

Think for example how we package Python modules in Fedora. The summary and description is basically repeated with an additional statement on which version of Python it targets based on the flavor.

In my view, implementing this feature would make things like proper multilibbing automatic, as well as doing libs/devel splits automatically, Python package flavor splits, Ruby flavor splits, and so on.

Yes, there are exceptions: TeXLive and the Linux kernel are probably not going to benefit much from this, for example. But this is a huge improvement for the overwhelming majority of cases.

ignatenkobrain commented 4 years ago

So I guess this is waiting for me to put my thoughts here…

Features (extras)

For both cases:


There are basically these cases where I think autosubpackages are needed:

Things to not forget:


I guess we need something like @ikeydoherty is describing about file pattern matching (which we already have thanks to dependency generators), so probably if we just extend that syntax to make them output some specially formatted attributes (json? some subset of specs?) and then merge it.

So let's take case with some rust package which has a binaries, shared libraries, devel stuff (with multiple features) and some custom utils subpkg. That would mean that in buildroot there will be:

The full spec would have something like (auto: prefix is what user should not write, but would be auto-generated, omitting anything what dependency generators would generate):

auto: # Any files which will not be used in other subpackages will go into the main package
auto: %global _unmatched_files_in_main_package 1
…
Name: rust-foo
Summary: Something very useful
…
auto: %files
auto: %{_bindir}/foo
…
auto: %package -n libfoo-1
auto: Summary: %{summary} - libfoo.so.1
auto: %files -n libfoo-1
auto: %{_libdir}/libfoo.so.1
auto: %{_libdir}/libfoo.so.1.0.0
…
auto: %package -n rust-foo-devel
auto: Summary: %{summary} - Rust development files
auto: %files -n rust-foo-devel
auto: %{_datadir}/cargo/registry/foo-1.0.0/
…
auto: %package -n rust-foo+default-devel
auto: Summary: %{summary} - Rust development files for "default" feature
auto: %files -n rust-foo+default-devel
auto: %ghost %{_datadir}/cargo/registry/foo-1.0.0/Cargo.toml
…
auto: %package -n rust-foo+a-devel
auto: Summary: %{summary} - Rust development files for "a" feature
auto: %files -n rust-foo+a-devel
auto: %ghost %{_datadir}/cargo/registry/foo-1.0.0/Cargo.toml
…
%package doc
%files doc
%doc html
…

Now customization part is coming, I need to add %license into the rust-foo-devel package, so:

-auto: %files -n rust-foo-devel
+%files -n rust-foo-devel
+%license LICENSE

should not throw error that package was not defined, but rather check it at the end of build, after packages were generated.

Same if I decide to override summary of some subpackage, it should simply merge them, keeping user-written changes with highest priority:

-auto: %package -n libfoo-1
-auto: Summary: %{summary} - libfoo.so.1
+%package -n libfoo-1
+Summary: Custom summary - libfoo.so.1

This means that for text fields we should override, but for arrays (like files) we should append.

OTOH we probably should not allow such customizations in this way, but rather have each generator output everything needed in lua or to the filesystem and create special section %subpackages where there will be files like $pkgname.(summary|files) generated by new generators. at that point it would be very similar to the %generate_buildrequires section so that you can do whatever you need with those files. I think this would be my preferred idea.

Thoughts?

ignatenkobrain commented 4 years ago

Forgot to mention that %subpackages section should store files in the %{buildroot} too, so that there is possibility to write generators which would depend on whole state of subpackages (current problem with dependency generators).

ignatenkobrain commented 4 years ago

Oh yeah, this way we can solve problem described in #1073 but having some script which will put license thing into the $pkgname.license.

nim-nim commented 4 years ago

So I guess this is waiting for me to put my thoughts here…

A lot of those things are already handled Fedora-side in our fonts and go packaging macros.

  1. you define a pivot %{fooX} variable, with X a suffixed index à la %{SOURCEX}. If it is present in the spec file, that means you need to generate the foo set of packages with foo rules (you can mix subpackages of different types in a spec file, ie fonts + go in golang-x-image, you can have multiple subpackage of the same type in the spec file, and a single control variable may generate multiple kinds of subpackages, for example for Go we want to generate go source module packages, gopath source packages, and eventually dynlib packages. %{gomodX} means generating the three above subpackages for each %{gomodX} line in the spec)

  2. for each %{fooX} pivot variable, you define a set of optional sub-variables that allow controlling the generation. For example %{gomodsummaryX} will contain the summary of the Go module subpackage associated with %{gomodX}, and if not set by the packager will be autocomputed to 'The %{gomodX} Go module'

You end up with a huge list of subvariables, that are only set in special cases, so the average spec is kept small and maintainable.

Thus, we have a %{fontlicenseX} that allow setting the license of a font subpackage (with a fallback to %{license}). Font file bundled distributed with other stuff are usually subject to their own licensing. We have a %{foolicensesX} that sets the licensing files associated with a subpackage. We have a %{foodocsX} that sets the documentation files associated with a subpackage

  1. And lastly, we have %{fooheaderX} that lets the packager inject manual header elements in generated subpackages, typically manual Provides/Obsoletes, because that’s downstream stuff that can not be guessed from upstream metadata.
nim-nim commented 4 years ago

After lots of refactorings, I’re reduced the complexity of fonts/go (not published yet) header generation to the trivial

https://pagure.io/fonts-rpm-macros/blob/009ccace3f337f3410cf0b4b789af692fce766d7/f/rpm/lua/srpm/fonts.lua#_135

And setting the rpm variables that uses in a safe way in presence of mutliple subpackages to

https://pagure.io/fonts-rpm-macros/blob/009ccace3f337f3410cf0b4b789af692fce766d7/f/rpm/lua/srpm/fonts.lua#_118

The bulk of the complexity is identifying all the subvariables a domain will need, and defining the fallback rules between those variables (the thing that makes the "magic" work for users of the macros because most of the subvariables have sane default values that almost never need overriding). There’s a lot of domain knowledge at this stage, even some things like licensing files and documentation are needed for every domain.

nim-nim commented 4 years ago

And, you absolutely need the pivot and subvariables set spec-wide, in the preamble or some early section, because a lot of the domain info will be used in several spec sections, not just in %files, %build, %whatever.

For example Free Desktop people invented the idiotic appstream descriptor for font files, which is so ass backwards even Gnome’s own Bistream Vera or Cantarell do not ship one.

So one of the fonts subvariables is %{fontappstreamX}. If set, will use the packager-specified appstream files (a couple font projects were sucked into writing those before everyone noticed it was a waste of time). If not set font package automation will compute those in %install from the files actually installed at that point, and validate the (computed or provided files) in %check, and use them in %files (would be cleaner to do it as %build %install %check %files but at %build stage you do not know yet which files will be installed and need appstreaming)

ignatenkobrain commented 4 years ago

A lot of those things are already handled Fedora-side in our fonts and go packaging macros.

Sorry, I'm not interested in this black magic which nobody except you understand. I am interested in user-friendly solution which is supposed to be implemented in RPM.

You end up with a huge list of subvariables, that are only set in special cases, so the average spec is kept small and maintainable.

Exactly because of this. I don't want to have overcomplicated macros, I want simple configuration which I can tune to support different kinds of behaviors for different ecosystems.

https://pagure.io/fonts-rpm-macros/blob/009ccace3f337f3410cf0b4b789af692fce766d7/f/rpm/lua/srpm/fonts.lua#_135

I did not open a link, but I see that it points to line 135. I don't want to have anything in lua which is more than 10 lines. Better to not have lua at all involved here.

nim-nim commented 4 years ago

It is user friendly. It is not maintenance friendly because it workarounds rpm deficiencies. A lot of the complexity is simulating arrays from individual suffixed variables when rpm does not expose array primitives. That’s not specific to my macros, rpm had to hand-code %sources because the array of source files is not exposed to the spec.

If you remove the parts that simulate hashtables and arrays, the rest of the code is very simple 'if foo has not been set, use this default value.'

Exactly because of this. I don't want to have overcomplicated macros, I want simple configuration which I can tune

You can turn it all way round configuration means conf variables. The thing that scares people in fonts and go macro is reading and setting those variables using rpm primitives. If you do not have a large set of conf variables you end up with inflexible generation that no one uses because there is always one little bit the generation gets wrong for actual packaged projects (and of course the little bit is different for every project).

I did not open a link,

Then perhaps you should do so before commenting on stuff you did not read

Conan-Kudo commented 4 years ago

@ignatenkobrain: Unlike the Go stuff, the fonts Lua macros are considerably simpler to understand, just there's a lot of functions.

But @nim-nim, I agree that we need this functionality natively in RPM. The contortions that openSUSE goes through to generate flavor subpackages for Ruby and Python with their %rubygem_subpackages/%python_subpackages macros really indicate we need this functionality built-in to RPM. That does not obviate the need for macros to manage this behavior, but it does mean that we'd have better primitives for managing this.

ignatenkobrain commented 4 years ago

It is user friendly. It is not maintenance friendly because it workarounds rpm deficiencies. A lot of the complexity is simulating arrays from individual suffixed variables when rpm does expose an array element.

That is exactly why I said having new section like %subpackages where anybody can do something like:

echo "MIT" > subpkg1.license
sed -i -e "/^useless-thing.pdf$/d" libfoo.files

instead of dealing with arrays in lua or any other macros.

No?

nim-nim commented 4 years ago

echo "MIT" > subpkg1.license sed -i -e "/^useless-thing.pdf$/

That’s actually much worse than what the go and fonts macro do. It’s only simple because you’re thinking small with a single conf variable. And did not code reading back, overriding and fallbacking those variables (common configuration needs).

Multiply by the amount of variables correct domain generation needs, and that a srpm can generate dozens of separate subpackages, and we’ll see who is writing cryptic black magic code. Some of the "horrible" Go macro code actually does that and I already decided to rip those parts in the next implementation.

Having real rpm variables is so much easier on packagers and macro writers than reading from magic files. By the 10th variable cat-ing grep-ing and sed-ing gets old fast, should you insist on the conf file idea, you want a real conf file parser, that rpm does not provide; and you realise that 99% of what you want with the conf file in the first place is things that you end up declaring as rpm variables to control your rpm macros, so you may as well dispense with a conf file and declare variables directly in the spec file.

Also, idiots will insist the rpm shell is /bin/sh and force you to use obsolete limited time-wasting shell syntax instead of modern bash capabilities; use as little shell as you can to avoid those problems, and certainly not for complex things like reading user-provided conf files.

You can always cat spec variables to a file should you need to generate domain files (as I do for font appstream files). Doing the reverse in rpm does not work well, because of the lack of conf parser, and because you do not have the arrays and tables to put the parsed result in.

Lastly, dealing with magic files is a major mess in presence of multiple source archives (that rpm permits and people want supported), that all get extracted in different ways by %setup. I could deal with those safely now that I changed %forgemeta to actually set the directory each archive is extracted to in a separate %{extractdirX} global variable, but file and path collisions is not something I want to deal with if I can avoid it.

@ignatenkobrain: Unlike the Go stuff, the fonts Lua macros are considerably simpler to understand, just there's a lot of functions.

It is simpler and cleaner because it’s a 5th gen implementation of the concept. The forge (2 gen) and go macros (likewise) had to go first to help identify the generic patterns needed for generation. And code all the small helpers that keep the fonts code small and easy to read. The next set of go macros will start from the factorisation achieved by the fonts macros (of course, go is intrinsically harder, so it will probably stay more complex).

But @nim-nim, I agree that we need this functionality natively in RPM.

I don’t disagree, that’s why I am commenting here with feedback after implementing all this without rpm help Fedora-side for two different application domains.

If you want autogeneration to work, the hard shell to crack is simple setting and reading of the mass of configuration variables required to adjust automated generation to real-world project quirks, on a subpackage level (each subpackage may have its own quirks). It’s not the sole thing needed, but the other parts are easy compared to this one.

Once you have good variable handling, generation becomes the trivial code I posted https://pagure.io/fonts-rpm-macros/blob/009ccace3f337f3410cf0b4b789af692fce766d7/f/rpm/lua/srpm/fonts.lua#_135

I did not begin three years ago with this kind of code. I tried to set and read as few rpm variables as possible, and as late as possible JIT-way, because it was so painful in current rpm. I tried to work with the rpm argument parser, ended up with a wall of per-macro options that were never parsed as you wanted them to, and options that needed tedious repeating in pretty much every section of the spec (to my defense, people were actually clamouring for wall of options, %autosetup-like, till they actually got it and proceeded to declare their own global control variables to avoid repeating the same stuff in all parts of the spec).

So, I, was forced by reality to bite the bullet, use global control variables everywhere, set them early in the preamble (lastly, because of the rpm 4.15 Source: declaration clusterf*, but everything was already pointing towards generic general global early setting of control variables), emulate arrays because looping over subpackages in a spec requires looping over the same variables for every subpackage.

The only reason the fonts macros are easy to read and maintain now is because they benefit from a strong design framework where things like %{currentfontlicense} just exist at the point the automation logic needs them to exist.

ffesti commented 4 years ago

Looking at this there are a couple of separate issues. I wonder if the reason this has not been getting anywhere has been that we try to solve all these different things at once. May be we should split this into separate features and just start with one - solving only a few - but at least a few - real world cases.

The current font and go macros are a pain to implement but - obviously - don't require something that can't be done. But they rely on all the actual data being punched into the spec file by the user. As long as we still want to do that this is basically a question of improving the macro language or offering a nicer template language. One improvement would be to allow multiple sections for the build scripts. That way things would not need to be distributed all over the spec file but macros could create everything they need in one place. This ofc needs some way of ordering the sections - probably with a priority similar to file triggers.

Using data from the buildroot is currently not possible at all - with the exception of globs in file lists and dependency generators. To work around this we need to be able to create (sub) packages after the build. I contemplated template packages that get their attributes expanded after the build but I think this is too restrictive and too complicated to implement. I guess the easiest way to provide this is a spec file section that is not evaluated at parse time but is parsed after the build. We might want to disallow some things there but it will basically allow declaring sub packages. These could also be created by macros or by scripts processing the build root.

Generating sub packages raises the question of how they interact with other packages in the spec. One way regarding files is below. If looked at more general this is a complicated issue. Do we want to make the already parsed packages available in the spec file or in lua? And even if we do should we offer a way to alter them? For now I would probably postpone these requirements and concentrate on the easier use cases. But this is something that's necessary to be able to implement the debuginfo packages and it might not even be enough...

Then there is the question of making the package declarations smarter. Right now the file attributes and generated dependencies are calculated late - after the files have already been distributed to the packages. This can be changed but requires quite some re-factoring. After that one could use file attributes in %files. At the same time we could add priorities which allow "stealing" files from other packages. For me this looks very interesting but probably a second - or third - step.

I guess the trick is to choose one or two of those and just implement them and see where this leads us.

ignatenkobrain commented 4 years ago

I guess the easiest way to provide this is a spec file section that is not evaluated at parse time but is parsed after the build. We might want to disallow some things there but it will basically allow declaring sub packages. These could also be created by macros or by scripts processing the build root.

I think this is most straight way and is also most useful, because in order to generate subpackages you need to do it after %install when all files are already installed.

nim-nim commented 4 years ago

@ffesti Thank you for sharing a different analysis and point of view. I’ll correct some things here (I don’t fundamentally disagree with what you wrote, but you made some shortcuts that would block a real-world design)

The current font and go macros are a pain to implement but - obviously - don't require something that can't be done.

Actually this is slowly getting to the point I’ve written enough helpers for common needs in redhat-rpm-config that implementing a new macro set is easy. I will make a new dump of common helpers after the %new_package part is processed. That will get us to the point where a macro implementor can write things like: (rpm macro side)

# Run tests in the check section for a font (sub)package. Arguments:
# -z <number>         read the zth block of definitions, for example
#                     %{fontfamily<number>}
# -v                  be verbose
%fontcheck(z:v) %{lua:
local      fedora =  require "fedora.common"
local       fonts =  require "fedora.srpm.fonts"
local   fonts-rpm =  require "fedora.rpm.fonts"
local      suffix =  rpm.expand("%{?-z*}")
local     verbose = (rpm.expand("%{-v}") ~= "")
fedora.suffixloop(fonts-rpm.check, suffix, fonts.suffixes(), {verbose})
}

(and lua side)

-- Core of %fontcheck
local function check(suffix, verbose)
  fonts.env(suffix, verbose)
  print(rpm.expand([[
grep -E '^"%{_fontconfig_templatedir}/.+\.conf"' '%{currentfontfiles}' \
  | xargs -I{} -- sh -c "xmllint --loaddtd --valid     --nonet '%{buildroot}{}' \
  >/dev/null && echo %{buildroot}{}: OK"
grep -E '^"%{_datadir}/metainfo/.+\.xml"'        '%{currentfontfiles}' \
  | xargs -I{} --        appstream-util validate-relax --nonet '%{buildroot}{}'
]]))
end

and don’t worry at all about the heavy lifting done by the helpers to make that just work in presence of multiple subpackages. All the ugliness here is pure domain-specific code, the rpm-induced templating ugliness is hidden from the macro writer.

But they rely on all the actual data being punched into the spec file by the user.

If that was the case, they would need much longer specs packager side. A huge part of the complexity in the forge, go and fonts macros is computing domain-specific sane defaults from partial packager information (that’s why %forgemeta, %gometa and %fontmeta are complex. They fill in the blanks using complex domain-specific rules so the rest of the macro code and the packager in its spec do not have to worry if info X was filled or not.)

I don’t see this blank filling need going away. Even assuming upstream provided perfect metadata that does not need correction or overriding Fedora-side (and, we all know upstreams are not perfect), there will always be additional metadata that Fedora requires, but a domain-specific component system forgot to handle. The legal (licensing) aspect was already given in example. That’s not the only one.

Now, it is true that currently rpm constrains things in such a way, it is not possible to feed upstream info to this blank filling process, even when it is present as upstream metadata in the source archives.

Using data from the buildroot is currently not possible at all - with the exception of globs in file lists and dependency generators. To work around this we need to be able to create (sub) packages after the build.

Good automatic package generation would require moving the evaluation point of things used to construct headers (and sources!) at least after %prep, the same way dynamic build requires had to move to a section that follows %prep. Not sure how far it needs to move to be of some use. If you want the maximum automation benefit, that would be just before %files.

In a fully automated mode,

  1. the preamble is optional

  2. a first logic pass computes upstream sources in %sourcelist (since everything is moving to git nowadays, I have more specs that use the %forge macros now than specs that do not)

  3. then you have explicit Fedora patches in %patchlist

  4. starting from %prep, you need domain-specific processing, either via packager-specified explicit %fooprep calls, or via some automated detection process.

    For the reasons I exposed before, I prefer explicit calls till we understand the ordering requirements better. At this point you’re already in domain-specific generation logic but you may not know the components that will be created by this process yet.

    Thus %prep, %generate_buildrequires %build %install using explicit domain-specific macro calls, mixing the domain calls if necessary.

  5. Before %files, however, you need to decide how to ventilate the installed files between subpackages, which means you can not defer defining the corresponding subpackages any longer.

  6. That means you may not know the final install package list, naming and versioning before a section between %install and %files, and you can not evaluate this section without executing the previous build sections (including, dynamic buildrequires)

  7. And, you also may not know the final srpm name and versioning before this section. Because if you want to keep sane, the general case will be to name your srpm after the most critical of the generated install packages, unless the packager deliberately demands to use a srpm-specific name. And, if the packager does not demand a srpm-specific name, and forgets to tell which of the generated subpackages is the important one, the only sane default is to take the first of them as most important.

    That means you use a temporary srpm filename (probably just the spec name, changing the file extension) till the build progresses to this point.

  8. As explained, as the domain logic progresses, it can result in computing additional fedora-provided source files. And, those need to be present before the section that will use them, and after the section that computed them. So, lots of optional %sourcelists between %prep, %generate_buildrequires %build %install (some would add %check here)

  9. And, lastly, that also requires evaluating lua code section by section not at preamble end. It’s not much use moving automated header generation later in the spec, if the automation language rpm uses is evaluated at preamble time only. At minima, to be useful, the lua code in the new section should be able to do things with the results of the previous sections (read files created during those sections, have a way for the shell logic in those sections to export variables to rpm and lua logic)

So, all of this is doable, adding more build phases and sections, and we proved with %generate_buildrequires that could work great as long as rpm and mock people worked together.

However that will require careful communication, since removing the roadblocks this way, also removes the side-effects of those roadblocks, that existed for so long, that people have started to take as the natural and eternal rpm state.

nim-nim commented 4 years ago

@ffesti Thank you for sharing a different analysis and point of view. I’ll correct some things here (I don’t fundamentally disagree with what you wrote, but you made some shortcuts that would block a real-world design)

The current font and go macros are a pain to implement but - obviously - don't require something that can't be done.

Actually this is slowly getting to the point I’ve written enough helpers for common needs in redhat-rpm-config that implementing a new macro set is easy. I will make a new dump of common helpers after the %new_package part is processed. That will get us to the point where a macro implementor can write things like: (rpm macro side)

# Run tests in the check section for a font (sub)package. Arguments:
# -z <number>         read the zth block of definitions, for example
#                     %{fontfamily<number>}
# -v                  be verbose
%fontcheck(z:v) %{lua:
local      fedora =  require "fedora.common"
local       fonts =  require "fedora.srpm.fonts"
local   fonts-rpm =  require "fedora.rpm.fonts"
local      suffix =  rpm.expand("%{?-z*}")
local     verbose = (rpm.expand("%{-v}") ~= "")
fedora.suffixloop(fonts-rpm.check, suffix, fonts.suffixes(), {verbose})
}

(and lua side)

-- Core of %fontcheck
local function check(suffix, verbose)
  fonts.env(suffix, verbose)
  print(rpm.expand([[
grep -E '^"%{_fontconfig_templatedir}/.+\.conf"' '%{currentfontfiles}' \
  | xargs -I{} -- sh -c "xmllint --loaddtd --valid     --nonet '%{buildroot}{}' \
  >/dev/null && echo %{buildroot}{}: OK"
grep -E '^"%{_datadir}/metainfo/.+\.xml"'        '%{currentfontfiles}' \
  | xargs -I{} --        appstream-util validate-relax --nonet '%{buildroot}{}'
]]))
end

and don’t worry at all about the heavy lifting done by the helpers to make that just work in presence of multiple subpackages. All the ugliness here is pure domain-specific code, the rpm-induced templating ugliness is hidden from the macro writer.

But they rely on all the actual data being punched into the spec file by the user.

If that was the case, they would need much longer specs packager side. A huge part of the complexity in the forge, go and fonts macros is computing domain-specific sane defaults from partial packager information (that’s why %forgemeta, %gometa and %fontmeta are complex. They fill in the blanks using complex domain-specific rules so the rest of the macro code and the packager in its spec do not have to worry if info X was filled or not.)

I don’t see this blank filling need going away. Even assuming upstream provided perfect metadata that does not need correction or overriding Fedora-side (and, we all know upstreams are not perfect), there will always be additional metadata that Fedora requires, but a domain-specific component system forgot to handle. The legal (licensing) aspect was already given in example. That’s not the only one.

Now, it is true that currently rpm constrains things in such a way, it is not possible to feed upstream info to this blank filling process, even when it is present as upstream metadata in the source archives.

Using data from the buildroot is currently not possible at all - with the exception of globs in file lists and dependency generators. To work around this we need to be able to create (sub) packages after the build.

Good automatic package generation would require moving the evaluation point of things used to construct headers (and sources!) at least after %prep, the same way dynamic build requires had to move to a section that follows %prep. Not sure how far it needs to move to be of some use. If you want the maximum automation benefit, that would be just before %files.

In a fully automated mode,

1. the preamble is optional

2. a first logic pass computes upstream sources in `%sourcelist` (since everything is moving to git nowadays, I have more specs that use the `%forge` macros now than specs that do not)

3. then you have explicit Fedora patches in `%patchlist`

4. starting from `%prep`, you need domain-specific processing, either via packager-specified explicit `%fooprep` calls, or via some automated detection process.
   For the reasons I exposed before, I prefer explicit calls till we understand the ordering requirements better. At this point you’re already in domain-specific generation logic but you may not know the components that will be created by this process yet.
   Thus `%prep`, `%generate_buildrequires` ` %build` `%install` using explicit domain-specific macro calls, mixing domain calls if necessary.

5. Before `%files`, however, you need to decide how to ventilate the installed files between subpackages, which means you can not defer defining the corresponding subpackages any longer.

6. That means you may not know the final install package list, naming and versioning before a section between `%install` and `%files`, and you can not evaluate this section without executing the previous build sections (including, dynamic buildrequires)

7. And, you also may not know the final srpm name and versioning before this section. Because if you want to keep sane, the general case will be to name your srpm after the most critical of the generated install packages, unless the packager deliberately demands to use a srpm-specific name. And, if the packager does not demand a  srpm-specific name, and forgets to tell which of the generated subpackages is the important one, the only sane default is to take the first of them as most important.
   That means you use a temporary srpm filename (probably just the spec name, changing the file extension) till the build progresses to this point.

8. As explained, as the domain logic progresses, it can result in computing additional fedora-provided source files. And, those need to be present before the section that will use them, and after the section that computed them. So, lots of optional `%sourcelists` between `%prep`, `%generate_buildrequires` ` %build` `%install` (some would add `%check` here)

9. And, lastly, that also requires evaluating lua code section by section not at preamble end. It’s not much use moving automated header generation later in the spec, if the automation language rpm uses is evaluated at preamble time only. At minima, to be useful, the lua code in the new section should be able to do things with the results of the previous sections (read files created during those sections, have a way for the shell logic in those sections to export variables to rpm and lua logic)

So, all of this is doable, adding more build phases and sections, and we proved with %generate_buildrequires that could work great as long as rpm and mock people worked together.

However that will require careful communication, since removing the roadblocks this way, also removes the side-effects of those roadblocks, that existed for so long, that people have started to take as the natural and eternal rpm state.

dcermak commented 4 years ago

cc: @scarabeusiv @darix @coolo This could be relevant for subpackage generation for Ruby and Python in openSUSE/SLE.

Conan-Kudo commented 4 years ago

cc: @hroncok This is something we should look toward for next-gen Python packaging stuff.

darix commented 4 years ago

Isn't this basically the template idea that was presented at the opensuse conference in 2019?

With that you could even skip most of the the spec file.

Conan-Kudo commented 4 years ago

It is the thing that @ignatenkobrain and I were talking about last year, yes.

darix commented 4 years ago

no Florian gave a talk about it. https://media.ccc.de/v/2501-re-thinking-spec-files

Conan-Kudo commented 4 years ago

Ah, I forgot that he talked about it too.

nim-nim commented 4 years ago

Anyway, I needed to solve quite a lot of the problems involved in automated packages to prepare the switch of Fedora Go packages to Go modules.

I will push soonish the result to redhat-rpm-config (not because the Go automation is finished, I’d say it’s 90% done but in need of lots of testing, but because the infra is shared with font packages and Fedora i18n is getting nervous now that Fedora 33 change deadlines loom and they need the font automation finished for their own changes, which are getting critical now that apps have started rejecting some legacy font formats).

Of course, it won’t be able to depend on %prep and post-%prep processing for a lot of things, since that is locked down in current rpm, but this is just a little part of what automated generation needs, and I now know where that limited my implementation. (for the curious, https://src.fedoraproject.org/fork/nim/rpms/redhat-rpm-config/commits/forge-with-patches though that is still changing a lot, not because the core logic needs fixing, but because refactorings and naming cleanups tend to touch a lot of lines)

Things I learnt doing full package automation:

  1. it completely inverts functional tag/variable inheritance logic, you want processing to set variables, which are then pushed to %package sections (in tags), which are then turned into SRPM tags (the SRPM tags are computed from the %package tag values and not the reverse). That’s the normal case when your automated generation picks up a single thing to auto-package – you want the SRPM tags to reflect the single package you are auto-producing. That’s also the normal case when you are generating multiple auto-packages – you want to compose the SRPM license from all the package licenses, and not the reverse

  2. that means SRPM tags need to be yanked from the preamble, or the preamble turned into a proper %package section, that may occur very late in the spec file (the preamble needs to be just another %package section, with a marker that says it’s the source package section, and nothing else, and not forced to occur at the start of the spec).

  3. that also means you need global state variables computation and re-computation at every stage of the pipeline, not just a lua dump at preamble time. So, basically, evaluate lua macros at the start of each section, after the previous section shell code finished its execution (with recursivity when the result of previous processing is outputed a whole new section)

  4. you will compute and re-compute a lot of rpm variables, and current infra to read/set/inherit variables is crap. I managed to streamline pretty much every set of the auto-packaging process, and turn it in nice easily maintained macros (including for horribly complex stuff like go modules) except for the variable initialisation part (that was, already the case for %forgemeta 3 years ago, the logic in %prep is dead simple, computing the variables it needs is definitely not dead simple).

    Most of my additions to redhat-rpm-config are just helpers and routines to read and set rpm variables and make it less painful.

    I know that variables are just macros, but they are special macros without any argument. And that is critical when you are composing auto-generators. In an auto-generator world, you pass state from one stage to the next as variables, you do not serialise it to CLI arguments that need de-serialization at the next step (you could serialize/deserialize at all steps, that’s completely needless complexity, the rpm argument parser will drive you crazy before you finish implementing).

  5. right now, I’ve not found a case where I needed to interleave generators (ie apply %build generation for a Go component, then %build generation for a font component, then return to %build generation for a Go component). I did find cases where the generation order mattered, so right now my automation implementation permits declaring that %go_prep MUST occur after %forge_prep, for example. So far, a %foo_after variable that contains the list of things %foo must be executed after, is sufficient for my needs.

martinezjavier commented 3 years ago

Having some of this support built-in RPM (or a set of standard macros) will be very useful for packages like grub2, that not only are split in different subpackages but also have architecture and firmware interface specific subpackages.

Fedora's grub2.spec file is relatively simple but that's because all the complexity is hidden in a grub.macros. It would be much better if RPM provided features that could be used instead of custom macros that only the package maintainers are able to follow.