rapidsai / rmm

RAPIDS Memory Manager
https://docs.rapids.ai/api/rmm/stable/
Apache License 2.0
478 stars 195 forks source link

[FEA] limit symbol visibility in DSOs in the Python package #1645

Closed jameslamb closed 1 month ago

jameslamb commented 1 month ago

Is your feature request related to a problem? Please describe.

As of this writing, the visibility of symbols in the dynamic shared objects (DSOs) built for the rmm Python package are not explicitly controlled. As a result, those DSOs contain symbols that are resolved at load-time and which the linker is allowed to find in other binaries.

This increases library size, load times, and the risk of runtime conflicts. See, for example, this case from recent development in cudf: https://github.com/rapidsai/cudf/pull/15483#discussion_r1670892743

There, I observed runtime errors caused by an incompatible mix of spdlog:: symbols being found in rmm and nvcomp.

example stack trace (click me) ```text #0 0x00007fb5e0c51365 in void spdlog::pattern_formatter::handle_flag_(char, spdlog::details::padding_info) () from /opt/conda/lib/python3.10/site-packages/libcudf/lib64/libnvcomp.so #1 0x00007fb5d6a61f59 in spdlog::pattern_formatter::compile_pattern_(std::__cxx11::basic_string, std::allocator > const&) () from /opt/conda/lib/python3.10/site-packages/rmm/_lib/memory_resource.cpython-310-x86_64-linux-gnu.so #2 0x00007fb5d6a62c39 in spdlog::logger::set_pattern(std::__cxx11::basic_string, std::allocator >, spdlog::pattern_time_type) () from /opt/conda/lib/python3.10/site-packages/rmm/_lib/memory_resource.cpython-310-x86_64-linux-gnu.so #3 0x00007fb5d6a81483 in rmm::mr::pool_memory_resource::try_to_expand(unsigned long, unsigned long, rmm::cuda_stream_view) () from /opt/conda/lib/python3.10/site-packages/rmm/_lib/memory_resource.cpython-310-x86_64-linux-gnu.so #4 0x00007fb5d6a483c8 in ?? () from /opt/conda/lib/python3.10/site-packages/rmm/_lib/memory_resource.cpython-310-x86_64-linux-gnu.so #5 0x0000561c3c2e7999 in type_call (kwds={'initial_pool_size': 27004082688}, args=(,), type=0x7fb5d6b4c8e0) at /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:1123 ```

Describe the solution you'd like

Similar to https://github.com/rapidsai/cudf/pull/15982, the visibility of symbols in the DSOs produced for the rmm Python package should be explicitly controlled.

Any that are not considered part of the public interface of the DSOs should use hidden visibility, so that they're always resolved internally.

Describe alternatives you've considered

N/A

Additional context

This is related to the work in https://github.com/rapidsai/build-planning/issues/33, and will become more important as RAPIDS moves more towards dynamic linking across RAPIDS projects.

Some useful references on this topic:

jameslamb commented 1 month ago

Note: work on this is in-progress at https://github.com/rapidsai/rmm/pull/1644