Open jameslamb opened 6 months ago
Adding an example for reference on how to investigate the contents of these files.
# choose a conda package from https://anaconda.org/rapidsai/librmm/files
RMM_CONDA_PACKAGE=https://anaconda.org/rapidsai/librmm/24.04.00/download/linux-64/librmm-24.04.00-cuda12_240410_g8f19c9c3_0.tar.bz2
# download it
wget \
"${RMM_CONDA_PACKAGE}" \
-O librmm.tar.bz2
# list its contents, filtering down for paths mentioned in the clobber errors
tar jtf ./librmm.tar.bz2 \
| grep -E 'fmt|spdlog'
For a recent version of librmm
, this shows the following:
Open PRs:
rapids-cmake
(https://github.com/rapidsai/rapids-cmake/pull/571)rmm
cudf
fmt
and spdlog
(from fmt
, spdlog
, and librmm
)nvidia
channel vs. conda-forge
raft
fmt
and spdlog
(from fmt
, spdlog
, librmm
, and libraft-headers-only
)nvidia
channel vs. conda-forge
libcusparse.so
) being pulled in by the {component}
and {component}-dev
packages from the nvidia
channelcub, libcudacxx, thrust
) (from librmm
and libraft-headers-only
)dask-cuda
(https://github.com/rapidsai/dask-cuda/pull/1325)cuml
fmt
and spdlog
*(from fmt
, spdlog
, librmm
, and `libraft-headers-onlnvidia
channel vs. conda-forge
libcusparse.so
) being pulled in by the {component}
and {component}-dev
packages from the nvidia
channelcub, libcudacxx, thrust
) (from librmm
and libraft-headers-only
)librmm
(from librmm
and libraft-headers-only
)cugraph
fmt
and spdlog
*(from fmt
, spdlog
, librmm
, and `libraft-headers-onlnvidia
channel vs. conda-forge
libcusparse.so
) being pulled in by the {component}
and {component}-dev
packages from the nvidia
channelcub, libcudacxx, thrust
) (from librmm
and libraft-headers-only
)librmm
(from librmm
and libraft-headers-only
)ucx-py
(https://github.com/rapidsai/ucx-py/pull/1035)cucim
(https://github.com/rapidsai/cucim/pull/713)cuspatial
fmt
and spdlog
(from fmt
, spdlog
, and librmm
)cuxfilter
kvikio
( https://github.com/rapidsai/kvikio/pull/356)fmt
and spdlog
*(from fmt
, spdlog
, librmm
, and `libraft-headers-onlnvidia
channel vs. conda-forge
libcusparse.so
) being pulled in by the {component}
and {component}-dev
packages from the nvidia
channelcub, libcudacxx, thrust
) (from librmm
and libraft-headers-only
)librmm
(from librmm
and libraft-headers-only
)It looks to me like the root cause of the fmt
and spdlog
clobbering warnings specifically is a combination of the following:
rapids_cpm_find()
forces download of any sources that have patches in https://github.com/rapidsai/rapids-cmake/blob/branch-24.06/rapids-cmake/cpm/versions.jsonfmt
and spdlog
both have patches thererapids_cpm_find()
ends up downloading those sources, it places them at a generic path like ${PREFIX}/include/fmt
. That conflicts with the fmt
conda-forge package.And I think there are 2 classes of fixes we could pursue.
fmt
and spdlog
across RAPIDS and remove the patches we're carrying around in rapids-cmake
for theminclude/fmt
and include/spdlog
I'm able to reproduce the errors about clobbering fmt
and spdlog
by running the following on the branch from https://github.com/rapidsai/rmm/pull/1508.
docker run \
--rm \
-v $(pwd):/opt/rmm \
-w /opt/rmm \
-it rapidsai/ci-conda:cuda12.2.2-ubuntu22.04-py3.9-amd64 \
bash
# do a 'librmm' conda build
ci/build_cpp.sh
I removed the fmt
and spdlog
patches in my fork of rapids-cmake
and ran rmm
's conda builds again, found that that resolved the clobber errors. (link to successful builds).
(link to diff on my rapids-cmake fork)
We could immediately resolve these fmt
and spdlog
clobber issues specifically by upgrading to the latest versions of those libraries and removing the patches in rapids-cmake
.
fmt
's most recent release was 10.2.1
, so that patch we're carrying around for the 10.1.1
version could be dropped
rapids-cmake
: (fmt/fix_10_1_1_version.diff)conda-forge
removing a similar patch: https://github.com/conda-forge/fmt-feedstock/pull/48spdlog
patch we've been carrying around has been upstreamed
rapids-cmake
: (spdlog/nvcc_constexpr_fix.diff)conda-forge
recipe: https://github.com/conda-forge/spdlog-feedstock/pull/59spdlog
: https://github.com/gabime/spdlog/pull/2901
conda-forge
yet, we'd need to revive this PR: https://github.com/conda-forge/spdlog-feedstock/pull/60But separately, we should try to permanently fix this so the clobbering issues don't re-appear in similar situations in the future.
If we want to preserve the ability to substitute in sources with not-yet-upstreamed patches in our conda builds, I can think of 2 options:
rapids-cmake
(or the way that projects invoke it) such that it writes those sources to project-specific paths like include/rmm/vendored/fmt/*
rapids-patched-sources
or something package on conda-forge
which contains copies of the patched sources we want projects to build with, and which is included as a host:
dependency for RAPIDS packagesDownloading to a project-specific path would be fine for headers and source files, but I'm not sure how to handle other types of files that we're getting clobbering warnings for like:
pkg-config
scripts (e.g. 'lib/pkgconfig/fmt.pc'
)<PackageNames>Config.cmake
scripts used by find_package()
(e.g. 'lib/cmake/spdlog/spdlogConfig.cmake'
)(https://github.com/rapidsai/rmm/pull/1508#issuecomment-2067300280)
Note I'm saying "project-specific" here because every RAPIDS package using rapids-cmake
and depending on something it decides to download is going to end up pulling these files in.
For example, look at the conda builds on https://github.com/rapidsai/cuml/pull/5821. It's getting 3 packages all trying to install the same fmt
headers to the same places: fmt
itself, librmm
, and libraft-headers-only
.
This transaction has incompatible packages due to a shared path.
packages: conda-forge/linux-64::fmt-10.2.1-h00ab1b0_0,
rapidsai-nightly/linux-64::librmm-24.04.00a38-cuda11_240326_ga98931b9_38,
rapidsai-nightly/linux-64::libraft-headers-only-24.04.00a93-cuda11_240326_g9637b3c2_93
path: 'include/fmt/args.h'
So if rapids_cpm_find()
were to write to the same path regardless of project, e.g. include/rapids/
, that'd fix conflicts with the fmt
package but still leave a risk of conflicts between multiple RAPIDS packages.
There are other clobber warnings, just starting with rmm
because it's the furthest upstream across RAPIDS projects.
Linking these related things (thanks @vyasr for the pointer).
There is already a rapids_core_dependencies
package that we're ~publishing~ building ~to~ which could be used avoid every RAPIDS conda package re-vendoring the same things:
Linking these related things (thanks @vyasr for the pointer).
There is already a
rapids_core_dependencies
package that we're publishing to avoid every RAPIDS conda package re-vendoring the same things:
We don't publish rapids_core_dependencies
currently, it was proposed but never done.
Ah ok. I missed the "as CI artifacts" part of the description in https://github.com/rapidsai/rapids-cmake/pull/414. Thank you, edited my original post.
I just put up a PR to build the latest version of spdlog
on conda-forge
: https://github.com/conda-forge/spdlog-feedstock/pull/61.
That'll be helpful whenever we decide to upgrade and drop the patch in rapids-cmake
(link).
I've looked across all the open PRs that add this setting:
conda config --set path_conflict prevent
And see the following common issues.
fmt
and spdlog
headers packaged in librmm
This transaction has incompatible packages due to a shared path.
packages: conda-forge/linux-aarch64::fmt-10.2.1-h2a328a1_0, file:///tmp/conda-bld-output/linux-aarch64::librmm-24.06.00a16-cuda12_240419_g9dfd9070_16
path: 'include/fmt/chrono.h'
librmm
headers packaged in libraft-headers-only
Example:
This transaction has incompatible packages due to a shared path.
packages: rapidsai-nightly/linux-64::librmm-24.04.00a38-cuda11_240326_ga98931b9_38, rapidsai-nightly/linux-64::libraft-headers-only-24.04.00a93-cuda11_240326_g9637b3c2_93
path: 'include/rmm/mr/device/arena_memory_resource.hpp'
references:
cub
, libcudacxx
, thrust
) packages in librmm
and libraft-headers-only
This transaction has incompatible packages due to a shared path.
packages: rapidsai-nightly/linux-64::librmm-24.04.00a38-cuda11_240326_ga98931b9_38, rapidsai-nightly/linux-64::libraft-headers-only-24.04.00a93-cuda11_240326_g9637b3c2_93
path: 'include/rapids/cub/agent/agent_radix_sort_upsweep.cuh'
lib{component}.so
from conda-forge/cudatoolkit
and nvidia/{component}
This transaction has incompatible packages due to a shared path.
packages: nvidia/linux-64::cuda-nvtx-11.8.86-0, conda-forge/linux-64::cudatoolkit-11.8.0-h4ba93d1_13
path: 'lib/libnvToolsExt.so'
lib{component}.so
from nvidia/{component}
and nvidia/{component}-dev
This transaction has incompatible packages due to a shared path.
packages: nvidia/linux-64::libcusparse-11.7.5.86-0, nvidia/linux-64::libcusparse-dev-11.7.5.86-0, conda-forge/linux-64::cudatoolkit-11.8.0-h4ba93d1_13
path: 'lib/libcusparse.so.11'
I'm pausing on this for now to help out with some of the other initiatives (like #31 and #33).
I believe https://github.com/rapidsai/raft/pull/2284 fixed the issues with raft
vendoring rmm
headers. Think the right long-term solution for the CCCL + fmt
+ spdlog
stuff is to distribute rapids-core-dependencies
with those things in it, which I'd love to help with (cc @robertmaynard) but probably couldn't get done before we enter burndown for 24.06.
Thanks for the update James! I agree that those two are more immediate priorities, but the progress you've already made on here is great.
This work is paused right now. See https://github.com/rapidsai/build-planning/issues/56#issuecomment-2087365946 and the things linked to it.
Closing these open PRs:
And a few others in private repos.
I believe rapidsai/raft#2284 fixed the issues with
raft
vendoringrmm
headers. Think the right long-term solution for the CCCL +fmt
+spdlog
stuff is to distributerapids-core-dependencies
with those things in it, which I'd love to help with (cc @robertmaynard) but probably couldn't get done before we enter burndown for 24.06.
Created a subissue for this task: https://github.com/rapidsai/build-planning/issues/109
(Apr 23, 2024) moved from an internal tracking board. The description here is @ajschmidt8 's original write-up of the issue.
Problem
Some of our conda builds suffer from
ClobberWarning
s (e.g. here and here).Ignoring these clobber warnings can sometimes result in packaging issues. Therefore, we'd like to begin treating these clobber warnings as errors.
Ideally, this setting should go in the
.condarc
file of our CI images (i.e. here) so that it applies to every RAPIDS repository.However, since many repositories already suffer from clobber warnings, this would break CI for a lot of people.
Therefore, we need to plan the rollout carefully.
Solution
To roll out this new configuration setting safely, we should do the following:
For each of the repositories in the section below:
a. Open a PR and add the following line to any build scripts (e.g.
ci/build_{cpp,python}.sh
):conda config --set path_conflict prevent
b. Resolve any build issues that result from this change c. If build issues occur, fix them and merge the PR (leaving in the new conda config line, that will be cleaned up later) d. If no build issues occur, simply close the PR e. Link the PR in this issue (do not link this issue in the PR, since this issue is in a private repository)Once all of the repositories are successfully using the new conda config setting, add the setting to https://github.com/rapidsai/ci-imgs/blob/main/context/condarc.tmpl
Go back and clean up all of the extraneous
conda config --set path_conflict prevent
lines in each repositoryRepositories
Look at the repos at this search query.