rapidsai / build-planning

Tracking for RAPIDS-wide build tasks
https://github.com/rapidsai
0 stars 4 forks source link

Treat clobber warnings as errors for conda builds #54

Open jameslamb opened 6 months ago

jameslamb commented 6 months ago

(Apr 23, 2024) moved from an internal tracking board. The description here is @ajschmidt8 's original write-up of the issue.

Problem

Some of our conda builds suffer from ClobberWarnings (e.g. here and here).

Ignoring these clobber warnings can sometimes result in packaging issues. Therefore, we'd like to begin treating these clobber warnings as errors.

Ideally, this setting should go in the .condarc file of our CI images (i.e. here) so that it applies to every RAPIDS repository.

However, since many repositories already suffer from clobber warnings, this would break CI for a lot of people.

Therefore, we need to plan the rollout carefully.

Solution

To roll out this new configuration setting safely, we should do the following:

For each of the repositories in the section below:

a. Open a PR and add the following line to any build scripts (e.g. ci/build_{cpp,python}.sh): conda config --set path_conflict prevent b. Resolve any build issues that result from this change c. If build issues occur, fix them and merge the PR (leaving in the new conda config line, that will be cleaned up later) d. If no build issues occur, simply close the PR e. Link the PR in this issue (do not link this issue in the PR, since this issue is in a private repository)

Once all of the repositories are successfully using the new conda config setting, add the setting to https://github.com/rapidsai/ci-imgs/blob/main/context/condarc.tmpl

Go back and clean up all of the extraneous conda config --set path_conflict prevent lines in each repository

Repositories

Look at the repos at this search query.

jameslamb commented 6 months ago

Adding an example for reference on how to investigate the contents of these files.

# choose a conda package from https://anaconda.org/rapidsai/librmm/files
RMM_CONDA_PACKAGE=https://anaconda.org/rapidsai/librmm/24.04.00/download/linux-64/librmm-24.04.00-cuda12_240410_g8f19c9c3_0.tar.bz2

# download it
wget \
    "${RMM_CONDA_PACKAGE}" \
    -O librmm.tar.bz2

# list its contents, filtering down for paths mentioned in the clobber errors
tar jtf ./librmm.tar.bz2 \
| grep -E 'fmt|spdlog'

For a recent version of librmm, this shows the following:

fmt and spdlog headers and static libraries (click me) ```text include/spdlog/details/windows_include.h include/spdlog/fwd.h include/spdlog/version.h include/spdlog/formatter.h include/spdlog/fmt/xchar.h include/spdlog/fmt/ranges.h include/spdlog/fmt/chrono.h include/spdlog/fmt/ostr.h include/spdlog/fmt/compile.h include/spdlog/details/console_globals.h include/spdlog/fmt/std.h include/spdlog/details/periodic_worker-inl.h include/spdlog/cfg/helpers.h include/spdlog/sinks/sink-inl.h include/spdlog/details/synchronous_factory.h include/spdlog/sinks/sink.h include/spdlog/details/log_msg_buffer.h include/spdlog/details/null_mutex.h include/spdlog/cfg/env.h include/spdlog/fmt/fmt.h include/spdlog/sinks/basic_file_sink-inl.h include/spdlog/details/log_msg-inl.h include/spdlog/details/log_msg.h include/spdlog/details/backtracer.h include/spdlog/sinks/stdout_color_sinks-inl.h include/spdlog/sinks/null_sink.h include/spdlog/cfg/argv.h include/spdlog/sinks/ostream_sink.h include/spdlog/sinks/base_sink.h include/spdlog/sinks/callback_sink.h include/spdlog/sinks/stdout_color_sinks.h include/spdlog/details/log_msg_buffer-inl.h include/spdlog/stopwatch.h include/spdlog/details/file_helper.h include/spdlog/details/periodic_worker.h include/spdlog/sinks/base_sink-inl.h include/spdlog/sinks/udp_sink.h include/spdlog/sinks/basic_file_sink.h include/spdlog/details/backtracer-inl.h include/spdlog/sinks/msvc_sink.h include/spdlog/common-inl.h include/spdlog/sinks/ringbuffer_sink.h include/spdlog/sinks/tcp_sink.h include/spdlog/async_logger.h include/spdlog/details/udp_client.h include/spdlog/sinks/stdout_sinks.h include/spdlog/sinks/dist_sink.h include/spdlog/async_logger-inl.h include/spdlog/sinks/rotating_file_sink.h include/spdlog/sinks/wincolor_sink.h include/spdlog/spdlog-inl.h include/spdlog/details/udp_client-windows.h include/spdlog/sinks/dup_filter_sink.h include/spdlog/cfg/helpers-inl.h include/spdlog/details/thread_pool.h include/spdlog/details/circular_q.h include/spdlog/details/thread_pool-inl.h include/spdlog/async.h include/spdlog/pattern_formatter.h include/spdlog/sinks/syslog_sink.h include/spdlog/sinks/mongo_sink.h include/spdlog/details/os.h include/spdlog/sinks/ansicolor_sink.h include/spdlog/details/tcp_client.h include/spdlog/details/registry.h include/spdlog/sinks/stdout_sinks-inl.h include/spdlog/sinks/kafka_sink.h include/spdlog/details/mpmc_blocking_q.h include/spdlog/sinks/systemd_sink.h include/spdlog/details/tcp_client-windows.h include/spdlog/details/fmt_helper.h include/spdlog/sinks/ansicolor_sink-inl.h include/spdlog/sinks/android_sink.h include/spdlog/sinks/rotating_file_sink-inl.h include/spdlog/details/file_helper-inl.h include/spdlog/sinks/wincolor_sink-inl.h include/spdlog/tweakme.h include/fmt/ostream.h include/spdlog/sinks/hourly_file_sink.h include/spdlog/logger-inl.h include/spdlog/fmt/bin_to_hex.h include/fmt/args.h include/spdlog/sinks/daily_file_sink.h include/spdlog/details/registry-inl.h include/spdlog/sinks/win_eventlog_sink.h include/fmt/xchar.h include/spdlog/sinks/qt_sinks.h include/spdlog/spdlog.h include/fmt/os.h include/fmt/std.h include/spdlog/common.h include/spdlog/logger.h include/spdlog/details/os-inl.h include/fmt/compile.h include/fmt/printf.h include/fmt/ranges.h include/fmt/color.h include/spdlog/pattern_formatter-inl.h include/fmt/chrono.h include/fmt/format-inl.h include/fmt/core.h include/fmt/format.h lib/libfmt.a lib/libspdlog.a lib/pkgconfig/spdlog.pc lib/pkgconfig/fmt.pc lib/cmake/fmt/fmt-targets-release.cmake lib/cmake/spdlog/spdlogConfigTargets-release.cmake lib/cmake/fmt/fmt-config.cmake lib/cmake/spdlog/spdlogConfig.cmake lib/cmake/fmt/fmt-config-version.cmake lib/cmake/spdlog/spdlogConfigVersion.cmake lib/cmake/fmt/fmt-targets.cmake lib/cmake/spdlog/spdlogConfigTargets.cmake ```
jameslamb commented 6 months ago

Open PRs:

jameslamb commented 6 months ago

Short Summary

It looks to me like the root cause of the fmt and spdlog clobbering warnings specifically is a combination of the following:

And I think there are 2 classes of fixes we could pursue.

Details

I'm able to reproduce the errors about clobbering fmt and spdlog by running the following on the branch from https://github.com/rapidsai/rmm/pull/1508.

docker run \
    --rm \
    -v $(pwd):/opt/rmm \
    -w /opt/rmm \
    -it rapidsai/ci-conda:cuda12.2.2-ubuntu22.04-py3.9-amd64 \
    bash

# do a 'librmm' conda build
ci/build_cpp.sh

I removed the fmt and spdlog patches in my fork of rapids-cmake and ran rmm's conda builds again, found that that resolved the clobber errors. (link to successful builds).

image

(link to diff on my rapids-cmake fork)

We could immediately resolve these fmt and spdlog clobber issues specifically by upgrading to the latest versions of those libraries and removing the patches in rapids-cmake.

But separately, we should try to permanently fix this so the clobbering issues don't re-appear in similar situations in the future.

If we want to preserve the ability to substitute in sources with not-yet-upstreamed patches in our conda builds, I can think of 2 options:

Downloading to a project-specific path would be fine for headers and source files, but I'm not sure how to handle other types of files that we're getting clobbering warnings for like:

(https://github.com/rapidsai/rmm/pull/1508#issuecomment-2067300280)

Note I'm saying "project-specific" here because every RAPIDS package using rapids-cmake and depending on something it decides to download is going to end up pulling these files in.

For example, look at the conda builds on https://github.com/rapidsai/cuml/pull/5821. It's getting 3 packages all trying to install the same fmt headers to the same places: fmt itself, librmm, and libraft-headers-only.

This transaction has incompatible packages due to a shared path.
  packages: conda-forge/linux-64::fmt-10.2.1-h00ab1b0_0,
                    rapidsai-nightly/linux-64::librmm-24.04.00a38-cuda11_240326_ga98931b9_38,
                    rapidsai-nightly/linux-64::libraft-headers-only-24.04.00a93-cuda11_240326_g9637b3c2_93
  path: 'include/fmt/args.h'

(build link)

So if rapids_cpm_find() were to write to the same path regardless of project, e.g. include/rapids/, that'd fix conflicts with the fmt package but still leave a risk of conflicts between multiple RAPIDS packages.

Notes

There are other clobber warnings, just starting with rmm because it's the furthest upstream across RAPIDS projects.

jameslamb commented 6 months ago

Linking these related things (thanks @vyasr for the pointer).

There is already a rapids_core_dependencies package that we're ~publishing~ building ~to~ which could be used avoid every RAPIDS conda package re-vendoring the same things:

robertmaynard commented 6 months ago

Linking these related things (thanks @vyasr for the pointer).

There is already a rapids_core_dependencies package that we're publishing to avoid every RAPIDS conda package re-vendoring the same things:

We don't publish rapids_core_dependencies currently, it was proposed but never done.

jameslamb commented 6 months ago

Ah ok. I missed the "as CI artifacts" part of the description in https://github.com/rapidsai/rapids-cmake/pull/414. Thank you, edited my original post.

jameslamb commented 6 months ago

I just put up a PR to build the latest version of spdlog on conda-forge: https://github.com/conda-forge/spdlog-feedstock/pull/61.

That'll be helpful whenever we decide to upgrade and drop the patch in rapids-cmake (link).

jameslamb commented 6 months ago

I've looked across all the open PRs that add this setting:

conda config --set path_conflict prevent

And see the following common issues.

1: fmt and spdlog headers packaged in librmm

This transaction has incompatible packages due to a shared path.
  packages: conda-forge/linux-aarch64::fmt-10.2.1-h2a328a1_0, file:///tmp/conda-bld-output/linux-aarch64::librmm-24.06.00a16-cuda12_240419_g9dfd9070_16
  path: 'include/fmt/chrono.h'

2: librmm headers packaged in libraft-headers-only

Example:

This transaction has incompatible packages due to a shared path.
  packages: rapidsai-nightly/linux-64::librmm-24.04.00a38-cuda11_240326_ga98931b9_38, rapidsai-nightly/linux-64::libraft-headers-only-24.04.00a93-cuda11_240326_g9637b3c2_93
  path: 'include/rmm/mr/device/arena_memory_resource.hpp'

references:

3: CCCL (cub, libcudacxx, thrust) packages in librmm and libraft-headers-only

This transaction has incompatible packages due to a shared path.
 packages: rapidsai-nightly/linux-64::librmm-24.04.00a38-cuda11_240326_ga98931b9_38, rapidsai-nightly/linux-64::libraft-headers-only-24.04.00a93-cuda11_240326_g9637b3c2_93
 path: 'include/rapids/cub/agent/agent_radix_sort_upsweep.cuh'

4: (CUDA 11.x only) getting the same lib{component}.so from conda-forge/cudatoolkit and nvidia/{component}

This transaction has incompatible packages due to a shared path.
  packages: nvidia/linux-64::cuda-nvtx-11.8.86-0, conda-forge/linux-64::cudatoolkit-11.8.0-h4ba93d1_13
  path: 'lib/libnvToolsExt.so'

5: (CUDA 11.x only) getting the same lib{component}.so from nvidia/{component} and nvidia/{component}-dev

This transaction has incompatible packages due to a shared path.
  packages: nvidia/linux-64::libcusparse-11.7.5.86-0, nvidia/linux-64::libcusparse-dev-11.7.5.86-0, conda-forge/linux-64::cudatoolkit-11.8.0-h4ba93d1_13
  path: 'lib/libcusparse.so.11'
jameslamb commented 6 months ago

I'm pausing on this for now to help out with some of the other initiatives (like #31 and #33).

I believe https://github.com/rapidsai/raft/pull/2284 fixed the issues with raft vendoring rmm headers. Think the right long-term solution for the CCCL + fmt + spdlog stuff is to distribute rapids-core-dependencies with those things in it, which I'd love to help with (cc @robertmaynard) but probably couldn't get done before we enter burndown for 24.06.

vyasr commented 6 months ago

Thanks for the update James! I agree that those two are more immediate priorities, but the progress you've already made on here is great.

jameslamb commented 3 months ago

This work is paused right now. See https://github.com/rapidsai/build-planning/issues/56#issuecomment-2087365946 and the things linked to it.

Closing these open PRs:

And a few others in private repos.

jakirkham commented 3 weeks ago

I believe rapidsai/raft#2284 fixed the issues with raft vendoring rmm headers. Think the right long-term solution for the CCCL + fmt + spdlog stuff is to distribute rapids-core-dependencies with those things in it, which I'd love to help with (cc @robertmaynard) but probably couldn't get done before we enter burndown for 24.06.

Created a subissue for this task: https://github.com/rapidsai/build-planning/issues/109