Generate build pipeline YML to avoid unclear template usage

dagood commented 12 months ago

@gdams mentioned yml generation in a meeting (another team uses this general approach), and I really like it. So, I wrote it up as a pseudo-ADR, picking MADR in particular, to try out that structure. (Although there are some "I" and clear opinions that I think I'd swap out if I were writing this to check in.)

Context and Problem Statement

We use AzDO Pipeline templates a lot. To overcome reusability limitations in the pipeline yml language, we have templates that accept a "inner" template's filename as a parameter and calls the "inner" template with some additional parameters. This can be quite hard to unravel when reading straight through without already knowing the end goal.

This works ok once you're used to it, and sticking to template-time logic avoids a lot of other odd AzDO Pipelines evaluation behavior. However, adding additional layers of parameters-based-on-parameters is a significant mental tax for both implementers and readers, encouraging copy-paste and quick hacks over reuse, and resulting in unclear template name distinctions like pool.yml and pool-core.yml that only exist to create two template contexts.

This applies to microsoft/go for build/test/sign/publish, but also https://github.com/microsoft/go-infra for release automation pipelines. Those also use complicated pipeline yml patterns that would benefit from generation, perhaps even more than microsoft/go pipelines.

Decision Drivers

We want the pipelines to be maintainable.
People new to the team should be able to follow how each pipeline works with minimal AzDO pipeline experience.

Considered Options

Leave it as it is. It's worked so far.
Generate build pipeline YML using Go code.

Decision Outcome

Generate build pipeline YML.

Consequences

Go is a much more featureful language than AzDO pipeline yml. We can more concisely and more clearly express the list of builders and the tweaks necessary for each platform/configuration.
Someone who looks at our repo can easily read through the runnable pipeline yml and see the steps that are taken for a particular builder.
We can have more confidence about how a change to core logic will affect (or not affect) every builder. This helps confidence with small fixes that are only intended to affect one platform.
- It is technically possible to download AzDO-evaluated yml from an AzDO build, but that requires URL manipulation to get the data and the output is difficult to correlate to the actual templates (in my experience).
We can include builder-specific details in yml comments to guide the reader, without compromising the maintainability of the yml.
- With yml, you can e.g. document yml parameters, but when the same parameters are used in many files (simply for passthrough) it is a burden to document them in each place rather than having the reader look for the original source of the data themselves.
- Large comments in the middle of a yml template make it harder to match up the indentation used for template if, each, and other nested structures, so scanning through the broad logic becomes more difficult.
We can test the builder matrix and yml outputs themselves.
I have some more complaints about pipeline yaml at https://github.com/microsoft/go-infra/blob/main/docs/pipeline-yml-style.md, and abstracting away from that language would likely let us avoid some of these quirks as well.
In PR, reviewing Go code and evaluated yml at the same time can clarify intent, making it easier for the reviewer to spot problems and ignore non-problems.

However:

When you discover a problem in a running pipeline, it may be more difficult to look at the generated yml and make a small fix, because the Go code that generated that code is what needs to change.
- This can be mitigated by including references to Go code in the generated yml, but ultimately the fixer needs to understand to some degree how the Go code works.
We would need to add CI to ensure the yml output always matches what the Go code would produce, to keep it in sync. For the same reason, devs submitting PRs that change the pipelines would also now need to run a tool to update the yml.
- Ideally we would not check in any yml, and generate it inside the pipeline. As far as I know, this is not possible in AzDO pipelines. If it is possible, we will need to consider it separately, because it would also make it harder to spot pipeline yml quirks and increase the impact if our yml generator has a bug causing non-reproducible output.

Confirmation

Some templates should still be used to make the resulting yml readable, and keep sharing functionality with .NET Arcade. However, a reasonable complexity metric would be: no template accepts a template as a parameter.

More Information

Example template accepting a template, used to make parameters that depend on other parameters. Templates are from eng/pipeline/stages. Our pipeline entrypoint calls a "matrix" template to set up the list of jobs:

stages:
  - template: stages/go-builder-matrix-stages.yml
    parameters:
      innerloop: true

The go-builder-matrix-stages template calls a "wrapper" template that will expand the objects in the shorthandBuilders array, telling the wrapper template what template to run internally once each object has additional properties added to it:

stages:
  - template: shorthand-builders-to-builders.yml
    parameters:
      jobsTemplate: builders-to-stages.yml
      jobsParameters:
        sign: ${{ parameters.sign }}
        createSourceArchive: ${{ parameters.createSourceArchive }}
        releaseVersion: ${{ parameters.releaseVersion }}
      shorthandBuilders:
        - ${{ if eq(parameters.innerloop, true) }}:
          - { os: linux, arch: amd64, config: buildandpack }
          - { os: linux, arch: amd64, config: devscript }
          - { os: linux, arch: amd64, config: test }
[...]

shorthand-builders-to-builders exists to create values that can be reused by the inner jobsTemplate. Without adding another layer of templates, you can't refer to one yml element's value from another one to build upon an existing value.

stages:
  - template: ${{ parameters.jobsTemplate }}
    parameters:
      ${{ insert }}: ${{ parameters.jobsParameters }}
      builders:
        - ${{ each builder in parameters.shorthandBuilders }}:
          - ${{ insert }}: ${{ builder }}
            # Use 'default' in place of null to define ID. This value just needs to be unique and
            # only contain "[A-z_]+".
            id: ${{ builder.os }}_${{ coalesce(builder.distro, 'default') }}_${{ coalesce(builder.hostArch, 'default') }}_${{ builder.arch }}_${{ builder.config }}_${{ coalesce(builder.experiment, 'default') }}_${{ coalesce(builder.fips, false) }}
            ${{ if not(builder.hostArch) }}:
              hostArch: ${{ builder.arch }}
[...]

In this case, jobsTemplate is builders-to-stages, which takes the expanded builder objects and passes them into the actual job template, adding more pipeline-specific parameters on top.

stages:
  - ${{ each builder in parameters.builders }}:
    - template: pool.yml
      parameters:
        inner:
          template: run-stage.yml
          parameters:
            builder: ${{ builder }}
            createSourceArchive: ${{ parameters.createSourceArchive }}
            releaseVersion: ${{ parameters.releaseVersion }}
[...]

gdams commented 11 months ago

@dagood agree with all the suggestions here. This is a great ADR writeup! Do you want to split this up into sub tasks and we can look at implementing this?

dagood commented 11 months ago

Sure! Here are some parts that we went over in the Go sync, with some additional thoughts:

Add tests for eng/_util/cmd/updatelinktable. This tool generates a file that lives inside this repo, like the yml will be, so getting tests implemented for this will establish the infra and workflow that will work the same for yml generation tests.
- cmd/updatelinktable could be split into:
- cmd/updatelinktable, the user-runnable main package
- updatelinktable, the inner implementation, containing updatelinktable_test.go.
- (It's a relatively common Go pattern to have a cmd and non-cmd package working together like this.)
- These _util tests should run in GitHub Actions for speed, for fast response on devs' PRs.
Write a tool that generates an equivalent to eng/pipeline/pr-pipeline.yml. Focus on one pipeline first to get one end-to-end done.
- A test should check that the generated yml matches what should be generated by the current Go code. (Effectively a golden file test.)
- This step starts to involve opinion: how much yml should be written inline vs. continue to use shared template references? We could inline the entire thing in a single yml file to start with, to avoid opinions at the beginning.
Generate rolling-internal-pipeline.yml next. This should share a lot of code with the PR build yml, so I'd expect refactoring.
Generate the other pipelines in microsoft/go.
Selectively migrate shared yml generation code into microsoft/go-infra.
- The aim is to be able to make shared fixes to certain parts of yml generation, but we should keep enough code in microsoft/go so upgrading the dependency doesn't change yml in a way that breaks compatibility across Go major versions.
- Having the checkout step in go-infra makes sense so we can update it vs. an AzDO breaking change.
- Having the list of platforms in go-infra makes less sense because we may add more platforms in future Go versions that we aren't able to build for old Go versions.
Generate https://github.com/microsoft/go-infra/tree/main/eng/pipelines. These can also be split up.
- There are some particularly interesting patterns here that would be good to generate, like the nesting at https://github.com/microsoft/go-infra/blob/b97993839e48d5153f9ff3fe79b276a314a93357/eng/pipelines/release-build-pipeline.yml#L86
- Work on these pipelines would be split up. I figure we'll need to think about it when we get here.
- Solidifying how to perform test runs of full release automation is a likely prereq.
I think we don't need to generate yml for https://github.com/microsoft/go-images, as we reuse .NET infra. It mirrors how .NET sets it up, so changing it would be confusing when we need to take updates.

Related opportunity:

https://github.com/microsoft/go/issues/402. Having test infra prepared makes this easier to do.
Add scenario tests! These would run in AzDO rather than GitHub Actions because they depend on the Go toolset build. We can build on the infra and use _util as the entrypoint for scenario tests.

dagood commented 1 week ago

Changing the go-infra release pipelines to use a release agent (https://github.com/microsoft/go-lab/issues/122) removes a significant amount of the complex templates in go-infra. This has made me think about this issue again from a more critical perspective.

I still agree with this goal:

People new to the team should be able to follow how each pipeline works with minimal AzDO pipeline experience.

However, I think we can improve our pipelines to be understandable without writing an abstraction in Go. My concerns are generally that the pipelines would then be brittle, harder to debug, and tedious to work on for new reasons. You'd need to know a lot about AzDO Pipelines and also our abstraction. Security changes, for example, will be shown to us as AzDO pipeline yml. The dev work is then not only to update our pipeline, but also edit the abstraction to emit the right yml.

Something that's coming to mind that I think I like is actually just improving/adding some relatively small AzDO yml preprocessing features. So, we would still edit AzDO yml (an augmented version of it), and our tool would preprocess the extra features into ordinary AzDO yml.

Feature ideas that could address current pain points:

Better value and expression reuse

A major reason for the deeply nested templates is simply to share some simple values at template-evaluation-time.

We can subtly improve the - template: syntax to support more types of templates that copy values into places we can't with AzDO's limited types of templates. This could fill in small values, or the template could e.g. write an inlined loop of our set of platforms in a more intuitive way.

Perhaps each file could have a top-level expressionliterals: key that contains common expressions that can then be reused later in the same file like ${{ expressionliterals.IsWindows }}. This would be useful to clarify the purpose of an expression in a centralized location while keeping the runtime logic (e.g. steps) concise.

To share more logic in microsoft/go pipelines with a source of truth in microsoft/go-infra, we can embed some templates into the tool and share via the eng/_util dependency.

Better computed parameter reuse

Sometimes deeply nested templates are there to simply reuse a calculation, like ${{ startsWith(variables['Build.SourceBranch'], 'refs/heads/microsoft/release-branch') }}.

I think we could inline the expression into any place it's used without calculating the result ourselves (which we can't always do).

We could generate nested AzDO yml templates to create actually-shared values. (This may not be feasible because AzDO pipelines have nesting limits.)

Better looping

This nested hierarchy used to be worse, but still screams for a feature that simplifies it and makes it not fragile if the number of steps needs to change:

https://github.com/microsoft/go-infra/blob/a1e18487bf2bb90a38eee940a095decb0e60cac6/eng/pipelines/release-build-pipeline.yml#L108-L136

The release agent work will remove the need for this pattern, so I think the justification here for loop features is slim. But, worth mentioning because the release agent is why I'm posting this comment. 🙂

Another place we use loops is the retry logic in microsoft/go, and it could potentially be improved, but it's honestly not all that fragile or hard to read IMO.

microsoft / go