prefix-dev / rattler-build

rattler-build is a universal package builder for Windows, macOS and Linux
https://prefix-dev.github.io/rattler-build
BSD 3-Clause "New" or "Revised" License
218 stars 48 forks source link

Idea for better (non-redundant, more compact) branching syntax for selectors? #1155

Open h-vetinari opened 3 weeks ago

h-vetinari commented 3 weeks ago

While thinking about #1154, it occurred to me that it would be even nicer if we could do a match-style syntax that doesn't have to choose a single branch but could apply several!

        files:
          - match: linux
              - something_linux
            match: osx
              - something_osx
            match: unix         # e.g. on linux, this should be taken
              - something_unix  # _in addition_ to the linux-only case
            match: win
              - something_win
            match: unix or win
              - something_that_applies_to_everything
            # no more `else:`, only explicit matches!

I guess it might be a bit late for such deep syntactic changes, but IMO this would make it much nicer to write recipes, rather than having to constantly restart if: branches to avoid exhaustivity-problems.

It would allow writing using a single match: (or whatever you want to call it, like the old sel: before it turned into if:) block for every section of the recipe, which would be much nicer IMO if we're looking at translating things like

examples [this](https://github.com/conda-forge/mayavi-feedstock/blob/main/recipe/meta.yaml) ```yaml - cython # [build_platform != target_platform] - vtk =*=osmesa_* # [build_platform != target_platform and build_variant == "noqt"] - vtk =*=qt_* # [build_platform != target_platform and build_variant != "noqt"] - qt-main # [build_platform != target_platform and build_variant == "pyqt"] - qt6-main # [build_platform != target_platform and build_variant == "pyside6"] ``` or [this](https://github.com/conda-forge/polars-feedstock/blob/main/recipe/meta.yaml) ```yaml - crossenv # [build_platform != target_platform] - maturin >=1.3.2,<2 # [build_platform != target_platform] - {{ stdlib("c") }} - {{ compiler('c') }} - {{ compiler('cxx') }} # [win] - {{ compiler('rust') }} - posix # [build_platform == "win-64"] - cmake - make # [unix] - cargo-feature # [build_platform != target_platform and target_platform == "win-64"] ``` etc.

The more dimensions (unix vs win, native vs. cross, python version, CUDA or not, implementation variants for openmpi/BLAS/QT/etc.) we need to cover in a given recipe, the harder it gets to do that with simple if: else: branches. Consider something like

        build:
          - match: build_platform != target_platform
              # cross-compilation
            match: build_platform != target_platform and cuda_version
              # cross-compiled CUDA
            match: match(cuda, "<12")
              # CUDA 11.x
            match: match(cuda, ">=12.0")
              # CUDA 12.x
            match: linux
              # linux
            match: osx
              # osx
            match: unix
              # unix
            match: win
              # win
            match: blas_impl == "mkl"
              # mkl

It would be a much lower mental overhead to be able to freely shuffle around lines based on which sections they apply to (e.g. not having to restructure the following into correctly (non-)overlapping if: statements), only adding lines to the selector where they are supposed to be applied.

wolfv commented 3 weeks ago

It's completely impossible because duplicate keys are forbidden :(

h-vetinari commented 3 weeks ago

It's completely impossible because duplicate keys are forbidden :(

Well, how about making each - match: a separate list entry? That's also how multiple if:'s work IIUC.

In other words, the following (note the extra - everywhere)

        build:
          - match: build_platform != target_platform
              # cross-compilation
          - match: build_platform != target_platform and cuda_version
              # cross-compiled CUDA
          - match: match(cuda, "<12")
              # CUDA 11.x
          - match: match(cuda, ">=12.0")
              # CUDA 12.x
          - match: linux
              # linux
          - match: osx
              # osx
          - match: unix
              # unix
          - match: win
              # win
          - match: blas_impl == "mkl"
              # mkl

... should be desugared to the same list as the impossible variant that has multiple match:. The only thing the desugaring would need to learn is to ignore empty lists (for a match: that doesn't apply), though I guess that's already covered by the existing infra (e.g. for an if: without else:)

baszalmstra commented 3 weeks ago

Why not use a top level match with nested cases?

h-vetinari commented 3 weeks ago

Why not use a top level match with nested cases?

I'd be happy with that too!

wolfv commented 3 weeks ago

But how is it different from multiple if statements?

baszalmstra commented 3 weeks ago

You dont need to repeat the negation of the if clauses.

You can more easily write something specific for win, macos, unix without having to repeat not win, etc.

Sorry, im on a phone so its a bit hard to type a proper example.

wolfv commented 3 weeks ago

For the record, it's actually possible to use arbitrary Jinja in multi-line YAML strings. I am not sure if we want to keep it, but the following works (after a small bug fix from #1156):

build:
  script: |
    echo "Is ${{ target_platform == "linux-64" }}"
    {% if target_platform == "linux-64" %}
      echo "I AM A LINUX"
    {% elif osx %}
      echo "I AM A MAC"
    {% endif %}

And of course, you could just as well use bash.

So I am not sure if it's worth it to make our syntax more complicated or not ...

Right now it's also possible to do arbitrary things with set and for loops btw. - it just has to be encapsulated in a multi-line string :)

baszalmstra commented 3 weeks ago

One of the design goals was to make the recipe easier to parse for machines. I think what you are suggesting is taking a step back. Having a match-kinda statement seems like a great (and logical) building block.

wolfv commented 3 weeks ago

Yeah but we've never semantically understood the contents of the scripts themselves so I'm not sure if we need that. This doesn't affect the "everything is pure YAML" property.

baszalmstra commented 3 weeks ago

Sure but just as you mentioned in another issue you just closed, we can validate the control flow statements using the schema. That would become harder and harder the more jinja we start using.

wolfv commented 3 weeks ago

If we build our own LSP though .. :) I think I do get your point, but I don't see it super different from e.g. people having issues with their bash scripts inside the script tag.

MementoRC commented 1 day ago

For the record, it's actually possible to use arbitrary Jinja in multi-line YAML strings. I am not sure if we want to keep it, but the following works (after a small bug fix from #1156):

build: script: | echo "Is ${{ target_platform == "linux-64" }}" {% if target_platform == "linux-64" %} echo "I AM A LINUX" {% elif osx %} echo "I AM A MAC" {% endif %} And of course, you could just as well use bash.

So I am not sure if it's worth it to make our syntax more complicated or not ...

Right now it's also possible to do arbitrary things with set and for loops btw. - it just has to be encapsulated in a multi-line string :)

This does not seem to work:

          {% for bin in qemu_bins %}
            test -f ${QEMU_LD_PREFIX}/{{ bin }}
          {% endfor %}
 │ │ │ + test -f $PREFIX/aarch64-conda-linux-gnu/sysroot/lib64/ld-2.17.so
 │ │ │ + test -f '$PREFIX/aarch64-conda-linux-gnu/sysroot/lib64/{{' bin '}}'
 │ │ │ $PREFIX/etc/conda/test-files/qemu-user-execve-aarch64/0/conda_build.sh: line 14: test: too many arguments

The issue with passing to a bash -c ... is that ${{ xxx }} expands as ["...", "...",] which trips-up bash as it is expecting arrays to be more like (xxx, xxx, ), so this also fails: bash -c 'l=${{ qemu_bins | split(" ") | list }}; for bin in "\${l[@]}"; do test -f ${QEMU_LD_PREFIX}/${bin}; done'

wolfv commented 1 day ago

Where did you set qemu_bins? The context has a behvior where it casts everything to a string.

But if you do:

          {% set qemu_bins = ["foo", "bar"] %}
          {% for bin in qemu_bins %}
            test -f ${QEMU_LD_PREFIX}/{{ bin }}
          {% endfor %}

It has a chance of working :)

h-vetinari commented 1 day ago

Why not use a top level match with nested cases?

Actually, one reason why I do have a preference for bare match:, is that match: - case ... -case suggests disjoint conditions again (at least based on how pattern matching works in most languages), whereas I'd like to not match one specific thing to various disjoint cases, but rather freely match conditions for different quantities that may overlap. Using a different syntax there would avoid triggering some unhelpful muscle memory.

wolfv commented 1 day ago

@h-vetinari sorry, not following. What's the difference of match then to a number of if clauses if the match' clauses are disjoint? I think what we had in mind so far would be a match that breaks after it finds the first matching entry, e.g.

match:
  - case: unix
    then: 
      - bla
  - case: win and arm64
    then: 
      - foobar
  - case: win
    then:
      - foobar-win64 # NOT happening on arm64
wolfv commented 1 day ago

Also your initial example is not at all valid YAML if I see that correctly.

wolfv commented 1 day ago

Attached two screenshots highlighting the issues (from https://yamlchecker.com/)

Image Image

h-vetinari commented 1 day ago

I think what we had in mind so far would be a match that breaks after it finds the first matching entry, e.g.

I explained why that's not desirable in the OP (or at least that there are cases where branches shouldn't be exclusive).

Also your initial example is not at all valid YAML if I see that correctly.

Yes, I hate YAML. 🙃

wolfv commented 1 day ago

The way I understand it is that you want "if"-syntax without then, correct?

You want falsey branches to be skipped, but every truthy branch to be taken, right?

In e.g. Rust, when usign match, only the first truthy branch is taken. In C/C++ you would have to add a break for this behavior. In Python also only the first truthy value is matched.

But if I understand your proposal correctly, you want every truthy branch to "match"?

Can you add corrected YAML to your proposal? Because I think if you write out proper YAML, it will look just like the current if statements.

MementoRC commented 1 day ago

It has a chance of working :)

Right! Silly me, I forgot I needed to split (I did know that, just got caught in the Jinja mixin). The variable is in the context section, it is a ${{ [x, y, z] | join() }}

For this particular case, it seems that the rattler/jinja is more complex. The array needs to be joined, then split in script where the {%%} is accepted. Am I missing something?

Also, it seems that the comments are not completely ignored, I had mistakenly put a "xx' in a comment and it seemed to generate a syntax error, that went away after correction (not 100% sure, though)

Update: Not to be difficult, but the initial point was to have the list be defined once for the several packages that use it. I don't think it will work: {% for bin in {{ qemu_bins | split(' ') | list }} %} fails to be parsed and a global {% fails with:

Error:   × Parsing: failed to parse YAML: unexpected character: `%'
   ╭─[1:2]
 1 │ {% qemu_bins_j = [
   ·  ────────┬────────
   ·          ╰── here
 2 │       "bios.bin", "bios-256k.bin", "bios-microvm.bin", "qboot.rom", "vgabios.bin", "vgabios-cirrus.bin"
   ╰────

So I will use python -c ... ${{ qemu_bins }}.split() ..., which is neat enough I concede