rapidsai / build-planning

Tracking for RAPIDS-wide build tasks
https://github.com/rapidsai
0 stars 4 forks source link

Use stricter pattern for reading data from other files in conda recipes #72

Closed jameslamb closed 5 months ago

jameslamb commented 5 months ago

Description

conda-build supports a function load_file_data() which can be used to load data from a file that is then referenced in other Jinja2 template statements throughout the recipes.

See "Loading data from other files" (conda-build docs).

Across RAPIDS, this is used in a few places to read data out of pyproject.toml files and re-use it in conda recipes. For example, in dask-cuda:

{% set data = load_file_data("pyproject.toml") %}

build:
  ...
  entry_points:
    {% for e in data.get("project", {}).get("scripts", {}).items() %}
    - {{ e|join(" = ") }}
    {% endfor %}

(dask-cuda/conda/recipes/dask-cuda/meta.yaml)

That function returns a Python dictionary. Because of the use of .get() in those statements, it's possible for data that should be present to be silently ignored.

Those should be switched to a stricter access pattern that'd result in a loud build-time error if any expected data is missing, like this:

build:
  ...
  entry_points:
    {% for e in data["project")["scripts"].items() %}
    - {{ e|join(" = ") }}
    {% endfor %}

Benefits of this work

Acceptance Criteria

Approach

The supported functions are:

Use GitHub search to double-check that the task list attached to this issue captures all uses of them across RAPIDS.

Whenever those are used, switch to a stricter pattern using [] subsetting instead of .get().

Notes

Inspired by my experience in https://github.com/rapidsai/ucx-py/pull/1044, moving dependency data around in pyproject.toml and needing to adjust a corresponding conda recipe.

### Tasks
- [x] dask-cuda (https://github.com/rapidsai/dask-cuda/pull/1349)
- [x] numba-cuda (https://github.com/rapidsai/numba-cuda/pull/3)
- [x] pynvjitlink (https://github.com/rapidsai/pynvjitlink/pull/90)
- [x] rapids-build-backend (https://github.com/rapidsai/rapids-build-backend/pull/44)
- [x] ucx-py (https://github.com/rapidsai/ucx-py/pull/1044)
jameslamb commented 5 months ago

"always use .get()" might be too strict / not possible.

See these discussions:

And this failing PR: https://github.com/rapidsai/dask-cuda/pull/1349

jameslamb commented 5 months ago

There was an interesting case I ran into in dask-cuda, where it seems like this is not supported:

{% for e in data.get("project", {}).get("scripts", {}).items() %}
  - {{ e|join(" = ") }}
{% endfor %}

but this is:

{% for k in data["project"]["scripts"] %}
  - {{ k ~ " = " ~ data["project"]["scripts"][k] }}
{% endfor %}

ref: https://github.com/rapidsai/dask-cuda/pull/1349#discussion_r1636912955

@jakirkham if you look at that and think "that's unexpected and looks like a bug in conda-build", let me know and I'd be happy to put together a bug report. I'm just not sure if it's a bug or just a gap in my understanding of Jinja2.

jameslamb commented 5 months ago

This is complete 🎉