rapidsai / dask-build-environment

Build environments for various dask related projects on gpuCI
4 stars 11 forks source link

reduce config duplication, update third-party actions #101

Closed jameslamb closed 2 months ago

jameslamb commented 3 months ago

Closes #100

Follow-up to #97

Contributes to https://github.com/rapidsai/build-planning/issues/40

Proposes the following changes:

Notes for Reviewers

How do I test this?

I can't figure out where these image builds actually happen. At https://github.com/rapidsai/dask-build-environment/actions, I only see the "Check for gpuCI Updates" job running.

Could someone please tell me where docker build based on these Dockerfiles is run and how I could test these changes?

jakirkham commented 3 months ago

Thanks James! πŸ™

Generally this seems reasonable to me. Asking Charles to take a closer look as he is the expert here πŸ˜‰

jameslamb commented 3 months ago

I just reverted the Python 3.12 part of this. Realized that some of these images are actually installing RAPIDS packages, and we don't yet have packages for Python 3.12.

e.g.: https://github.com/rapidsai/dask-build-environment/blob/e121aa88f172c7cfe83c2774125fd9d33a145dc6/dask/environment.yml#L1-L14

The work to add Python 3.12 here is tracked in https://github.com/rapidsai/build-planning/issues/40

charlesbluca commented 2 months ago

Thanks @jameslamb! A lot of long-overdue improvements here.

Could someone please tell me where docker build based on these Dockerfiles is run and how I could test these changes?

Yeah this is definitely something that could use more visibility - this is echoed as part of the build process, but this is only visible in the Jenkins runs, which I think are only visible to @rapidsai/dask-build-environment-write. Looking at one of these logs, the build command is:

docker build --pull -t gpuci/dask:24.10-cuda11.8.0-devel-ubuntu20.04-py3.10 --squash --build-arg RAPIDS_VER=24.10 --build-arg UCX_PY_VER=0.40 --build-arg CUDA_VER=11.8.0 --build-arg LINUX_VER=ubuntu20.04 --build-arg PYTHON_VER=3.10 /jenkins/workspace/dask/dask-build-environment/branch/dask-build-env-main/BUILD_NAME/dask/CUDA_VER/11.8.0/LINUX_VER/ubuntu20.04/PYTHON_VER/3.10/RAPIDS_VER/24.10/dask

Modifying this for local testing on this work (picking a tag arbitrarily):

docker build \
    --pull \
    -t gpuci/$PROJECT_NAME:$USER-dev \
    --squash \
    --build-arg RAPIDS_VER=24.10 \
    --build-arg UCX_PY_VER=0.40 \
    --build-arg CUDA_VER=11.8.0 \
    --build-arg LINUX_VER=ubuntu20.04 \
    --build-arg PYTHON_VER=3.10 \
    ./$PROJECT_NAME

Where PROJECT_NAME can be filled in with any of the subdirectories of this repo containing Dockerfiles.

jameslamb commented 2 months ago

Ah got it, thanks for that!

Yep, can confirm that I'm not able to view those Jenkins logs.

  1. Could you tell me if it looks like these changes are passing CI?
  2. If they are, could you approve and merge this?
  3. Would you support folks from RAPIDS build-infra team being added with write access here, so we can help with these sorts of things like modifying the set of supported CUDA / Python verisons?
charlesbluca commented 2 months ago

Could you tell me if it looks like these changes are passing CI?

Unfortunately, there isn't a PR builder job, so checking that these changes pass CI can't be done until this is merged in πŸ˜• and typically I've relied on local testing for more extensive changes.

Locally, I can verify that not specifying all of the expected build args causes a failure:

docker build \
    --pull \
    -t gpuci/dask:charlesb-dev \
    ./dask
...
ERROR: failed to solve: rapidsai/miniforge-cuda:cudaunset-base-unset-pyunset: failed to resolve source metadata for docker.io/rapidsai/miniforge-cuda:cudaunset-base-unset-pyunset: docker.io/rapidsai/miniforge-cuda:cudaunset-base-unset-pyunset: not found

Then confirmed for each project that including those build arguments allowed the builds to proceed and pass.

If they are, could you approve and merge this?

I noticed that we missed this explicit setting of RAPIDS_VER in the dask-image dockerfile, which I think we can actually remove altogether since we don't actually install RAPIDS in that image - this would look like:

Would you mind applying this change?

charlesbluca commented 2 months ago

Would you support folks from RAPIDS build-infra team being added with write access here, so we can help with these sorts of things like modifying the set of supported CUDA / Python verisons?

Followed up with an internal request to do this for now, if that spins off into a relevant GitHub issue can follow up here

jameslamb commented 2 months ago

Unfortunately, there isn't a PR builder job, so checking that these changes pass CI can't be done until this is merged in πŸ˜• and typically I've relied on local testing

Ohhh ok, thanks for that. I thought that maybe I just couldn't see the PR builds because I didn't have sufficient permissions.

Thanks for testing this one locally.

think we can actually remove altogether since we don't actually install RAPIDS in that image

Oh cool yeah, looks like that would be safe to do. I dropped RAPIDS_VER from dask_image in https://github.com/rapidsai/dask-build-environment/pull/101/commits/ddfd1e721cb6dbf3ee9bffff60975fc0b7a36a5c