rapidsai / build-planning

Tracking for RAPIDS-wide build tasks
https://github.com/rapidsai
0 stars 4 forks source link

Investigate removing spdlog from all public APIs #104

Open vyasr opened 2 months ago

vyasr commented 2 months ago

Currently RAPIDS makes use of spdlog and fmt in a few places that I am aware of. rmm, cudf, and cuml (please add to this list if there are more) all have their own loggers that expose public APIs to the user for capturing relevant information. This makes spdlog (and transitively, fmt) a build-time requirement of the packages (cudf, rmm, cuml). This is expected and unavoidable. However, it turns out that spdlog and fmt are actually also runtime requirements of these packages. At first glance this may seem surprising since rmm constrains its consumers to use the header-only versions of these libraries, so at runtime spdlog/fmt libraries do not actually have to exist on the system. The sticking point is that in addition to exposing their own logging APIs, rmm and cudf also allow the user to get a reference to the global logger directly (cudf, rmm). That means that even if they use the header-only spdlog, they need to constrain the ABIs of any other libraries within the same environment to ensure that compatible versions of spdlog are being used so that spdlog objects can be safely passed across library boundaries. In practice, in conda this runtime requirement is enforced via spdlog's (and fmt's) run exports. Unfortunately, in a conda-forge world this constraint becomes quite problematic because spdlog (and fmt) is such a central dependency that many libraries require it, and as a result the preferred version is globally pinned and managed for all conda-forge package builds. Even though RAPIDS publishes packages to a different channel, we aim for compatibility with conda-forge since it is the source for most community packages that our users want to use with RAPIDS libraries. As a result, whenever conda-forge changes its global pinning RAPIDS needs to also upgrade in lockstep to ensure that conda environments will solve. This results in meaningful pain for our consumers, who are forced to wait for us to upgrade, and adds work every few months when RAPIDS decides to upgrade. Ideally, we would find a way to avoid this problem.

cuml's logger implementation is a good example of how we might avoid this problem. cuml hides spdlog behind forward declarations and does not directly expose any symbols involving spdlog. In theory, it would be safe for cuml to ignore the run exports from spdlog entirely. However, there a few caveats: 1) cuml is currently implicitly relying on rmm to bring in spdlog in its build system, and is therefore at the mercy of upstream changes to how spdlog is used; 2) despite the way the code is designed, the cuml conda recipe is not configured to ignore the run exports, so it still has a direct runtime dependence on spdlog; and 3) even if the run exports were set up correctly, cuml would still inherit a transitive runtime dependence on spdlog from rmm.

To resolve this problem I propose that:

  1. We deprecate and remove the functions from cudf and rmm that allow users to interact directly with spdlog objects.
  2. Ensure that all references to spdlog are removed from the public headers of rmm, cudf, and cuml
  3. We remove spdlog and fmt from the requirements of these packages and rely on CPM to fetch them instead. rapids-cmake will still manage a central version, FWIW.

This proposal is along the lines of two similar proposals I made within the past year around removing protobuf and arrow as runtime dependencies for similar reasons.

Critical note Note that since rmm is header-only, the logger implementation is still going to exist directly in rmm's public headers, and therefore it is technically still part of the public API. Therefore, removing the fmt/spdlog constraint from rmm shifts the responsibility to the consuming library not to expose spdlog symbols in their own public headers. One alternative would be to have the librmm conda package ignore the run exports from spdlog/fmt but then to have librmm's own run export list include spdlog/fmt. That would mean that while spdlog/fmt would not be runtime requirements of librmm, they would become runtime requirements of any consumer of librmm. The benefit of that approach is that it chooses the safest path by default while allowing consumers to ignore the run exports from librmm if they wished to. The downside is that consumers would also have to be careful to pin_compatible librmm itself when ignoring the run exports since presumably they wish to retain ABI-compatibility guarantees with librmm itself (since e.g. all RAPIDS repos expose rmm symbols as part of their public interface in order to support stream-ordering and memory resource management).

P.S. This issue is proposing removing a current feature of rmm and cudf, namely the direct access to the underlying global spdlog loggers managed by those libraries. Based on offline discussions, removing this feature is acceptable, and if there are specific bits of functionality that we need to expose we can always enrich the rmm/cudf logger APIs to provide that by wrapping spdlog calls.

bdice commented 2 months ago

From what I know, this sounds like a good plan. I support it.

bdice commented 2 months ago

xref: https://github.com/rapidsai/cudf/issues/16951

ava6969 commented 2 months ago

Please very needed, I use the cudf and cuml c++ library, and currently I have a forked repo for all 3 of them were i just comment out all rapid call to find fmt and spdlog. also

jameslamb commented 1 month ago

Unfortunately, in a conda-forge world this constraint becomes quite problematic because spdlog (and fmt) is such a central dependency that many libraries require it...

Brand new example of that: https://github.com/rapidsai/docker/issues/712#issuecomment-2384746639

harrism commented 1 month ago

RMM has two forms of logging, as described in the docs. The logging_resource_adaptor, and debug logging (the built-in logger).

The former is useful for capturing logs in failing cases to better understand their allocation behavior. It needs to be compiled into RMM Python so that it can be dynamically enabled.

The latter is only used when debugging logic problems inside of RMM. It is the RMM_LOG_INFO, RMM_LOG_TRACE etc. macros. I have not actively used it in a long time, since I was developing the RMM pool.

I don't know if it would help, but we could disabled the latter (debug logging) completely by default, so that using in any form would require a rebuild with a CMake flag turned on. Currently that rebuild is required to change the logging level to something other than the default anyway.

vyasr commented 1 month ago

I'm going to work on the rmm piece in https://github.com/rapidsai/rmm/issues/1709. Once rmm is working I'll generalize that for rollout to other repos.