osbuild / otk

A proof of concept for a new `osbuild-mpp`.
https://osbuild.org/
Apache License 2.0
4 stars 9 forks source link

Sources generator #103

Open achilleas-k opened 5 months ago

achilleas-k commented 5 months ago

During our discussions and brainstorming we talked a lot about source generation and I'm still not completely convinced we have it figured out, or that the implicit solution we landed on is great.

For reference, my initial thought was to have the user list all their collected sources and define them under each source type, almost entirely by hand:

otk.define:
  build-packages:
    otk.external.osbuild-depsolve-dnf:
      include:
        - <build-packages>
      repos:
        - <build-repos>
  os-packages:
    otk.external.osbuild-depsolve-dnf:
      include:
        - <os-packages>
      repos:
        - <os-repos>
  some-file-embed:
     otk.external.file-embed:
       path: /path/to/file

pipelines:
  - name: build
    stages:
      - type: org.osbuild.rpm
        inputs:
           - build-packages.rpms
  - name: os
    stages:
      - type: org.osbuild.rpm
        inputs:
           - os-packages.rpms
       - type: org.osbuild.grub2
         options:
            kernel-version: kernel-ver

sources:
  org.osbuild.curl:
    items:
      otk.join:
        - os-packages.sources
        - build-packages.sources
        - somefiles.sources
  org.osbuild.inline:
     items:
       otk.join:
          - some-file-embed.source
          - kickstart-file-embed.source

This was rejected for being too cumbersome to write, which is understandable.

The alternative idea was to have externals/generators which can implicitly define sources that get added to the sources section automatically:

otk.define:
  build-packages:
    include:
      - <build-packages>
    repos:
      - <build-repos>
  os-packages:
    include:
      - <os-packages>
    repos:
      - <os-repos>

pipelines:
  - name: build
    stages:
      - otk.external.depsolve:
        packges: ${build-packages}
  - name: os
    stages:
      # generates "sources:" implicitly
      - otk.external.depsolve:
        packages: ${os-packages}
        namespace: 
      - type: org.osbuild.grub2
        options:
           kernel-version: ${kernel-ver}
      - org.external.osbuild.file-from-path:
        path: ./file

I'm not entirely comfortable with this idea. While I agree that the first one is cumbersome and requires a lot of knowledge of osbuild and osbuild manifests, I feel that the second one is too automagical.

I think there's a middle solution somewhere that would be perfect.

For one, I think needing a little bit of manifest knowledge to work with otk is acceptable (I'd even say desirable; we don't want to hide so much of osbuild manifests behind otk that users wont know how to work with them). Secondly, I like things being explicit. Perhaps we can't reach the ideal of having only pure functions, but we can chase it as far as we can.

I think the middle solution could look something like the GenSources() function in osbuild/images: https://github.com/osbuild/images/blob/b002d250372ff468a2250ba0e44ed7e45a501e54/pkg/osbuild/source.go#L57-L121

Basically, an external that consumes all the external resources and produces the whole sources section:

otk.define:
  build-packages:
    otk.external.osbuild-depsolve-dnf:
      include:
        - <build-packages>
      repos:
        - <build-repos>
  os-packages:
    otk.external.osbuild-depsolve-dnf:
      include:
        - <os-packages>
      repos:
        - <os-repos>
  some-file-embed:
     otk.external.file-embed:
       path: /path/to/file

pipelines:
  ...

sources:
  otk.external.gen-sources:
    urls:
      - os-packages
      - build-packages
      - somefiles
    inline-files:
      - some-file-embed
      - kickstart-file-embed

This still requires listing every source that gets resolved or defined, but I don't think it's that cumbersome. A name always needs to be defined to hold the output of an external call (e.g. os-packages) which will presumably be used in multiple locations (stage inputs, external generator arguments), so I don't think needing to add it to one more place (the sources section) in an existing external call is too much to expect.

It requires a bit of knowledge of osbuild manifests, but only as far as knowing that when a stage requires an external resource, it should be defined in the sources, without needing to know much about the types of sources or what the source name is (org.osbuild.curl, org.osbuild.inline, etc).

supakeen commented 5 months ago

Your suggestion is how I understand it will be implemented as in #87. However, as we are discussing it here (and I'm "against" the explicitness here) I have a question:

What does someone writing an omnifest gain by (semi-) explicitly listing sources?

achilleas-k commented 4 months ago

What does someone writing an omnifest gain by (semi-) explicitly listing sources?

Not much, if anything. Perhaps a tiny bit more visibility into how osbuild manifests are laid out, fewer abstractions, more explicitness. Whether that's good or bad is debatable (but I generally lean more towards preferring explicitness).

The decision here I think mostly affects maintainability and troubleshooting of externals and source generation. The implicit source generation makes me worry that it will become difficult for us (developers of otk and most externals) to easily answer the question "where did this particular source come from?", when several externals can be generating, for example, inline sources (it could be an explicit inline source file, a kickstart file or the first-boot activation key blob that we generate internally, etc). Seeing a list of variables at the bottom of the omnifest makes it much easier to trace the origin of a value that appears in the final manifest. It makes it very straightforward to reason about the construction of the manifest by knowing that object replacements (through externals or variables) are localised, there are no "remote actions".

supakeen commented 4 weeks ago

We now have sources generators. There has been some discussion on if they have to generate the full map (including the source name) or just the list.