rapidsai / build-planning

Tracking for RAPIDS-wide build tasks
https://github.com/rapidsai
0 stars 3 forks source link

update `fmt` and `spdlog` across RAPIDS #56

Closed jameslamb closed 1 week ago

jameslamb commented 5 months ago

Description

RAPIDS projects that need fmt and spdlog use rapids-cmake to find it, via functions like rapids_cpm_fmt() (docs link).

rapids-cmake is currently carrying patches to both of those libraries, and as a result always downloads sources of them. (see https://github.com/rapidsai/rapids-cmake/pull/525 for details).

Those patches have since been upstreamed and made it into official releases of those projects (in their main source control and on conda-forge). This proposes upgrading to those new versions across RAPIDS and dropping the patches in rapids-cmake.

Benefits of this work

Acceptance Criteria

Approach

On a branch, modify rapids-cmake/cpm/versions.json (code link) such that:

Follow "Overriding RAPIDS CMake" (rapids-cmake docs) to point builds of at least rmm, cudf, and raft at your branch of rapids-cmake with these changes. Also modify those projects' dependencies.yaml and/or conda recipe meta.yaml to change their version constraints on fmt and spdlog.

NOTE: this is a great task for rapids-reviser (https://github.com/rapidsai/rapids-reviser/tree/main/examples/fmt-10-spdlog-1.12).

Confirm that CI succeeds and that the correct versions of fmt and spdlog are being pulled in.

For rmm, re-use this PR for that purpose: https://github.com/rapidsai/rmm/pull/1508. If this change also results in there being 0 remaining conda clobbering warnings in rmm, leave in the conda config --set path_conflict prevent in that PR's build scripts, so we'll be alerted by a CI failure if something around this changes in the future.

Notes

What makes you confident that these patches can be removed?

From https://github.com/rapidsai/build-planning/issues/54#issuecomment-2072893399

fmt's most recent release was 11.0.2, so that patch we're carrying around for the 10.1.1 version could be dropped.

The spdlog patch we've been carrying around has been upstreamed

What are "clobber warnings" and how does this help with them?

As described in #54 and https://github.com/rapidsai/rmm/pull/1508, conda builds of rmm and other RAPIDS packages downstream of it are emitting dozens of warnings like these:

This transaction has incompatible packages due to a shared path.
  packages: conda-forge/linux-aarch64::fmt-10.2.1-h2a328a1_0, file:///tmp/conda-bld-output/linux-aarch64::librmm-24.06.00a16-cuda12_240419_g9dfd9070_16
  path: 'include/fmt/chrono.h'

This transaction has incompatible packages due to a shared path.
   packages: conda-forge/linux-aarch64::spdlog-1.12.0-h6b8df57_2, file:///tmp/conda-bld-output/linux-aarch64::librmm-24.06.00a16-cuda12_240419_g9dfd9070_16
   path: 'include/spdlog/async.h'

These are saying that the librmm conda package contains files whose paths exactly conflict with those from the fmt and spdlog packages from conda-forge. When such packages are installed together, one will overwrite the other, which could lead to runtime issues.

Removing the patches in rapids-cmake makes it more likely that builds will find the files they need from the fmt and spdlog conda-forge packages already present in the build environment, and therefore not vendor them, and therefore not ship a copy that causes conflicts.

### Code changes (these PRs should be merged)
- [x] `rapids-cmake` (https://github.com/rapidsai/rapids-cmake/pull/689)
- [x] `rmm` (https://github.com/rapidsai/rmm/pull/1678)
- [x] `cudf` (https://github.com/rapidsai/cudf/pull/16806)
- [x] `ucxx` (https://github.com/rapidsai/ucxx/pull/278)
- [x] `raft` (https://github.com/rapidsai/raft/pull/2433)
- [x] `cuspatial` (https://github.com/rapidsai/cuspatial/pull/1441)
- [x] `cuml` (https://github.com/rapidsai/cuml/pull/6071)
- [x] `integration` (https://github.com/rapidsai/integration/pull/722)
### Testing dependencies (do not merge)
- [x] `cuvs` (https://github.com/rapidsai/cuvs/pull/335)
- [x] `cumlprims_mg` (https://github.com/rapidsai/cumlprims_mg/pull/211)
- [x] `cugraph-ops` (https://github.com/rapidsai/cugraph-ops/pull/692)
- [x] `cugraph` (https://github.com/rapidsai/cugraph/pull/4655)
### Post-migration
- [x] Revert pinnings on geospatial dependencies and specific RAPIDS alphas in `integration` (ref: https://github.com/rapidsai/integration/pull/719#discussion_r1761816933 ... as of now, part of https://github.com/rapidsai/integration/pull/722)
jameslamb commented 5 months ago

@bdice drew my attention to these global pins in conda-forge-pinning-feestock:

fmt:
  - '10'

spdlog:
  - '1.12'

(code link)

I think the fmt one shouldn't be a problem for this migration, but the spdlog one will. There's an ongoing conda-forge migration to spdlog==1.13.0, but I don't know how to estimate how long that'll take: https://conda-forge.org/status/migration/spdlog113.

I think we should NOT update any RAPIDS packages to depending on the spdlog==1.13.0 conda-forge package until that conda-forge migration to that version is done. Does that seem right @jakirkham @bdice ?

References:

bdice commented 5 months ago

Yes, we should try to keep RAPIDS in sync with conda-forge. For these, I would move them forward once the migrator passes some critical threshold (for spdlog, I might base it on when packages like mamba and micromamba are migrated).

For the purpose of fixing our conda clobbering issues, we may want to work on some kind of vendoring solution (centralized via rapids-core-dependencies or copied in each RAPIDS conda package) because we have to carry patches for fmt and spdlog in rapids-cmake anyway. If we do that, we can remove all host/run dependencies on the conda-forge packages of fmt and spdlog and instead get them from upstream via rapids-cmake / CPM.

Also, if we vendor these and remove the dependencies from the RAPIDS conda packages, it would also make it possible to decouple from the global conda-forge pinnings and manage version updates solely through rapids-cmake (the status quo requires changes in both rapids-cmake and conda recipes).

kkraus14 commented 1 month ago

FYI -- There's active migrations for spdlog 1.14 and fmt 11 going on in conda-forge right now:

I believe RAPIDS is now multiple versions behind on spdlog which will make solving environments painful.

bdice commented 1 month ago

Thanks for the heads-up @kkraus14! We'll discuss and try to land an update for fmt/spdlog in the next RAPIDS release.

jameslamb commented 4 weeks ago

Adding links to the migration statuses:

jameslamb commented 2 weeks ago

Alright, I think this set of changes is ready for review!

See "How I tested this" in https://github.com/rapidsai/rapids-cmake/pull/689 ... I'm able to build all of RAPIDS with these new versions of fmt and spdlog, and observed the things we want to see:

Putting a plan for how to roll this out in writing, will start an internal chat thread to coordinate.

Phase 1: reviews

  1. ~Get approvals on https://github.com/rapidsai/rapids-cmake/pull/689~
  2. ~Get approvals on https://github.com/rapidsai/rmm/pull/1678~
  3. ~Get approvals on https://github.com/rapidsai/cudf/pull/16806~
  4. ~Get approvals on https://github.com/rapidsai/ucxx/pull/278~
  5. ~Get approvals on https://github.com/rapidsai/raft/pull/2433~
  6. ~Get approvals on https://github.com/rapidsai/cuspatial/pull/1441~
  7. ~Get approvals on https://github.com/rapidsai/integration/pull/722~

Phase 2: run tests

  1. ~Successfully run ALL CI on PRs listed in "Code changes (these PRs should be merged)" and "Testing dependencies (do not merge)"~
    • ~this needs to be done in RAPIDS dependency order, because PRs are pulling in packages produced by upstream dependencies' CI (e.g. cuml cannot be started until rmm, cudf, ucxx, and raft packages are available)~
    • that's why it's important to get approvals first... changing and re-testing rmm would be really expensive in this setup
  2. ~Remove testing-specific details (like pointing at my fork of rapids-cmake and downloading PR artifacts) from all PRs in "Code changes (these PRs should be merged)"~
    • ~using commits with [skip ci], to avoid a re-run of CI~

Phase 3: merge changes

  1. ~Merge https://github.com/rapidsai/rapids-cmake/pull/689~
  2. ~Merge https://github.com/rapidsai/rmm/pull/1678~
  3. ~(wait for rmm branch build to produce packages)~
  4. ~Merge https://github.com/rapidsai/ucxx/pull/278~
  5. ~Merge https://github.com/rapidsai/cudf/pull/16806~
  6. ~(wait for ucxx branch build to produce packages)~
  7. ~Merge https://github.com/rapidsai/raft/pull/2433~
  8. ~(wait for cudf branch build to produce packages)~
  9. ~Merge https://github.com/rapidsai/cuspatial/pull/1441~
  10. ~Merge https://github.com/rapidsai/cuml/pull/6071~
  11. ~(wait a day for all projects to produce nightly packages)~
  12. ~Re-run all CI on https://github.com/rapidsai/integration/pull/722~
  13. ~Merge https://github.com/rapidsai/integration/pull/722~

[!NOTE]
These will all require admin merges, because of the use of [skip ci] commits to remove testing-specific details

Phase 4: clean up

  1. Close all the PRs listed in "Testing dependencies (do not merge)" above
jameslamb commented 1 week ago

This is complete! RAPIDS is now building successfully against fmt==11.0.2 and spdlog==1.14.1.

Put up one follow-up item here: https://github.com/rapidsai/rapids-cmake/issues/695