rerun-io / rerun

Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.
https://rerun.io/
Apache License 2.0
6.26k stars 292 forks source link

Local-first + generated CI #4148

Open jprochazk opened 10 months ago

jprochazk commented 10 months ago

We do a lot of work on CI, and it's extremely difficult to keep track of how it all fits together. We've also had to deal with a lot of pain arising from the fact that:

[^1]: Not because it must run on CI, we just haven't put in the work to make it run locally

There are two big changes we can make to drastically improve the situation:

  1. Make everything on CI runnable locally
  2. Design a custom DSL and transpile it to GHA YAML files

Local-first CI

Every job first installs it dependencies, and then it runs some code[^2]. We want to ensure that all of that code is also runnable on every developer machine locally. To achieve this, we have to refactor every CI job to be a wrapper over this basic two-step process (install + run).

[^2]: Some jobs may install additional dependencies later in their lifecycle, but that's more of a consequence of our job sequencing, and not a requirement for the job to work.

The sync release assets workflow is a great example of what we want all of our CI to look like. Especially the fact that the inputs to the script are passed in explicitly.

Some notes for the process of extracting a CI job to run locally:

Codegen GHA away

As long as every job is not much more complex than install + run, it should be possible to ditch the GHA YAML files entirely, and instead use a custom DSL as the input to a GHA YAML file generator. Even if all this code generator did was use a different configuration file format and transpiled it to YAML, it would still be a big improvement in developer experience, but we can do much more than that.

Some (unordered, tentative) goals for this DSL and code generator:

We don't have to meet all of the above goals. The only strict requirement is that the DSL is not YAML, and it's possible to author the files without deep knowledge of GHA.

We will likely continue to hand-author some workflows with very specific requirements, but this should be usable for all jobs that perform builds/tests/linting.

teh-cmc commented 10 months ago

Design a custom DSL and transpile it to GHA YAML files

I would even prefer no DSL at all: just define a bunch of classes for Workflows/Jobs/Steps/etc and simply work with actual Python code. Build lists and graphs using good old code then dump everything as YAML.

In fact we don't even have to define these classes... they already exist.

Pushing this logic even further: do we even need to go through an intermediate YAML representation at all? Can't PyGithub configure workflows straight from in-memory objects?

jprochazk commented 10 months ago

Build lists and graphs using good old code then dump everything as YAML.

I think that qualifies as a DSL 😄. But I agree that it should be "good old code" as much as possible.

Pushing this logic even further: do we even need to go through an intermediate YAML representation at all? Can't PyGithub configure workflows straight from in-memory objects?

GHA requires the intermediate YAML files. There's no way to dispatch a job for a workflow that doesn't have a workflow ID, and those are given out to every workflow file.

jprochazk commented 10 months ago

I have a bit of a crazy proposal: I think we should use TypeScript for the "DSL" part. The code generator would use deno to create a barebones JS environment in which we'd execute .ts files to produce the high-level definition of each workflow. The high-level definition would then be compiled to the GHA YAML files.

I know, JS does not spark joy. But here's why I think it makes sense:

I definitely want to be careful not to overengineer this, but I would also like it if we didn't have to use JSON, TOML, or YAML for the DSL, and it doesn't seem like there's "middle ground" option that wouldn't either have sub-par tooling (such as building a custom language) or be annoying to use (such as Python)

jprochazk commented 10 months ago

We had a lengthy call about this:

It's clear that the valuable part of codegen is the ability to:

TypeScript is one option for the high-level configuration language, but before we decide on that we need to do more design work to determine what we actually need to be able to specify in the workflow definitions.

For now, we'll be focusing on refactoring our CI to be local-first.