Closed jameslamb closed 1 month ago
If you removed the RMM_EXPORT from a header, would the rmm Python builds or tests fail?
Good question!
I just tried removing the RMM_EXPORT
from include/rmm/cuda_stream.hpp
and repeated the testing from the PR description... building succeeded, ci/run_pytests.sh
succeeded.
Then tried removing ALL of the RMM_EXPORT
macros. Same thing... building succeeded, ci/run_pytests.sh
succeeded.
Maybe this is because the Python package only calls into the DSOs through public symbols generated by Cython, and those are not affected by this CMake change?
I'm going to try pushing a commit with those removals here, let's see what other CI jobs say.
My hope is that we can test the accuracy of our exports within RMM's CI rather than relying on cuDF to tell RMM that it broke.
My hope is that we can test the accuracy of our exports within RMM's CI rather than relying on cuDF to tell RMM that it broke.
Right, totally understand and I agree.
Unfortunately it looks like even removing those, everything passed: https://github.com/rapidsai/rmm/actions/runs/10326031645?pr=1644. So I just pushed https://github.com/rapidsai/rmm/pull/1644/commits/1c39756a6c26b451aa84579cc5ecaf9ccc6eae5c reverting this PR back to its original state.
Something about CI will have to change here if we want feedback on the exports working correctly within rmm's CI.
This is way at the edge of my understanding, so the only thing I can think of is to dump the symbol table for every DSO and pass it through some text processing (e.g. we shouldn't see any WEAK / GLOBAL spdlog::
symbols remaining in any DSOs, and I can envision how to check that with grep
+ the output of readelf --symbols
).
I'm certain someone else will have a better idea than that for how to express the behavior we want in CI here. cc @robertmaynard @KyleFromNVIDIA
Can you please open an RMM issue for this? Thanks.
As for testing, could add a separate test that links spdlog and RMM and make sure it compiles without symbol conflicts?
- Rather than setting global variables in CMakeLists.txt, we should set targets on properties inside
rapids-cmake
. You want properties like this one.rapids_cython_create_modules
knows whether a module is being compiled for C or C++, so nthe appropriate property can be set on a per-target basis.
I agree that we should use a per-target approach to setting the visibility. I would prefer if we use an explicit approach so that we don't effect other callers of rapids_cython_create_modules
.
It isn't clear to me what kind of C++ types cross shared libraries boundaries via Python. If we have C++ objects being shunted via Python between rmm and cudf, or rmm and cuml we will need the RMM symbols to be public ( which they currently are without this patch ) so that we have consistent types across multiple libraries.
Can you please open an RMM issue for this? Thanks
@harrism sure! I just opened https://github.com/rapidsai/rmm/issues/1645 and updated the description of this PR to link to it.
Can you please open an RMM issue for this? Thanks.
Is there a reason https://github.com/rapidsai/build-planning/issues/33 isn't sufficient? The purpose of the build-planning repo is for precisely this sort of cross-repo task tracking that is easier to discuss in a centralized location.
I agree that we should use a per-target approach to setting the visibility. I would prefer if we use an explicit approach so that we don't effect other callers of
rapids_cython_create_modules
.
That works, so instead of changing the default behavior, we'd add an option DEFAULT_VISIBILITY
or so and if it isn't set then don't specify any property on the target.
It isn't clear to me what kind of C++ types cross shared libraries boundaries via Python. If we have C++ objects being shunted via Python between rmm and cudf, or rmm and cuml we will need the RMM symbols to be public ( which they currently are without this patch ) so that we have consistent types across multiple libraries.
We definitely need to be able to hand around e.g. rmm device buffers across library boundaries. I'm not sure how that's reflected in terms of structs (classes) at the DSO level, though. Does visibility have any impact on whether they're safe to hand around? Structs don't actually have any corresponding symbols, so there's nothing to check there. As long as they are ABI-stable between the different libraries, isn't that sufficient? Basic library inspection tools won't even give us enough information to study the stability of the objects since we can't check on things like member ordering/alignment or vtable contents.
That works, so instead of changing the default behavior, we'd add an option
DEFAULT_VISIBILITY
or so and if it isn't set then don't specify any property on the target.
I have a slightly different proposal that'd be in the spirit of this without requiring visibility-specific arguments in the signature of rapids_cython_create_modules()
. Had been waiting to push it until I'd tested locally. But that local testing looked ok, so I just pushed some commits with it.
rapids_cython_create_modules()
populates a variable RAPIDS_CYTHON_CREATED_TARGETS
with a list of the targets it created (code link).
So without any change changes to rapids-cmake
, it's possible to do this in a target-centered way specifically for the rmm
Cython stuff.
Like this...
foreach(_cython_target IN LISTS RAPIDS_CYTHON_CREATED_TARGETS)
set_target_properties(
${_cython_target}
PROPERTIES
C_VISIBILITY_PRESET hidden
CXX_VISIBILITY_PRESET hidden
)
endforeach()
... after each call to rapids_cython_create_modules()
.
Doing it that way here (and in other similar projects) would allow us to test this behavior without expanding rapids-cmake
's public API. So if it turns out that this is wrong or insufficient, it could be reverted without breaking changes to rapids-cmake
.
Can you please open an RMM issue for this? Thanks.
Is there a reason https://github.com/rapidsai/build-planning/issues/33 isn't sufficient? The purpose of the build-planning repo is for precisely this sort of cross-repo task tracking that is easier to discuss in a centralized location.
Yes. It doesn't show up in the issues for an RMM release. And I don't get a notification showing me the work needed/being done on the library. I choose to follow @jarmak-nv 's suggested process of having an issue for all work done on RMM. It helps me track, plan, and report.
We definitely need to be able to hand around e.g. rmm device buffers across library boundaries. I'm not sure how that's reflected in terms of structs (classes) at the DSO level, though. Does visibility have any impact on whether they're safe to hand around? Structs don't actually have any corresponding symbols, so there's nothing to check there. As long as they are ABI-stable between the different libraries, isn't that sufficient?
If they are structs and not objects with vtables you are safe. The RTTI for hidden symbols can have the DSO embedded into it and therefore isn't safe across boundaries as you won't be able to dynamic_cast
it the correct type. This primarily happens with libc++
as it has more aggressive conformance around RTTI casting unlike libstdc++
.
I just tested the cudf
C++ wheels PR using artifacts produced from CI here and confirmed that this set of changes also resolves the symbol conflict issues.
ref: https://github.com/rapidsai/cudf/pull/15483#discussion_r1715502200
I've also proposed a new testing script implementing the idea from discussion above to try to add a CI-time check on symbol visibility. It shows up like this in wheel build logs:
checking exported symbols in '/tmp/tmp.6uVuClfluU/rmm/_cuda/stream.cpython-39-x86_64-linux-gnu.so'
symbol counts by type
* GLOBAL: 305
* WEAK: 5
* LOCAL: 1
checking for 'fmt::' symbols...
checking for 'spdlog::' symbols...
No symbol visibility issues found
checking exported symbols in '/tmp/tmp.6uVuClfluU/rmm/_lib/memory_resource.cpython-39-x86_64-linux-gnu.so'
symbol counts by type
* GLOBAL: 420
* WEAK: 28
* LOCAL: 1
checking for 'fmt::' symbols...
checking for 'spdlog::' symbols...
No symbol visibility issues found
...
Tested that locally and found that it did successfully fail when I omitted the visibility attributes from some targets. I've hard-coded fmt::
and spdlog::
specifically into there... not sure if there's a more generic way to write that. But this is an improvement over the current state and at least offers some protection against the specific source of conflicts this PR sought to address.
I think this is ready for another review.
That works, so instead of changing the default behavior, we'd add an option
DEFAULT_VISIBILITY
or so and if it isn't set then don't specify any property on the target.I have a slightly different proposal that'd be in the spirit of this without requiring visibility-specific arguments in the signature of
rapids_cython_create_modules()
. Had been waiting to push it until I'd tested locally. But that local testing looked ok, so I just pushed some commits with it.
rapids_cython_create_modules()
populates a variableRAPIDS_CYTHON_CREATED_TARGETS
with a list of the targets it created (code link).So without any change changes to
rapids-cmake
, it's possible to do this in a target-centered way specifically for thermm
Cython stuff.Like this...
foreach(_cython_target IN LISTS RAPIDS_CYTHON_CREATED_TARGETS) set_target_properties( ${_cython_target} PROPERTIES C_VISIBILITY_PRESET hidden CXX_VISIBILITY_PRESET hidden ) endforeach()
... after each call to
rapids_cython_create_modules()
.Doing it that way here (and in other similar projects) would allow us to test this behavior without expanding
rapids-cmake
's public API. So if it turns out that this is wrong or insufficient, it could be reverted without breaking changes torapids-cmake
.
@jameslamb Could you open a rapids-cmake issue to track this feature request? I happy with using the current foreach
approach in rmm for now so we can get the libcudf wheel PR merged and we can iterate on a full design of the API in parallel/after.
Could you open a rapids-cmake issue to track this feature request?
Sure! here you go: https://github.com/rapidsai/rapids-cmake/issues/672
happy with using the current foreach approach in rmm for now so we can get the libcudf wheel PR merged and we can iterate on a full design of the API in parallel/after.
Thanks! I agree.
Yes. It doesn't show up in the issues for an RMM release. And I don't get a notification showing me the work needed/being done on the library. I choose to follow @jarmak-nv 's suggested process of having an issue for all work done on RMM. It helps me track, plan, and report.
When we created build-planning we (including you and @jarmak-nv) specifically discussed this use case as one where the fragmentation associated with creating per-repo issues wasn't worth the cost. Without a centralized point of discussion (e.g. a build-planning issue) we end up with a comment on a cugraph issue pointing to a solution discussed in a combination of posts on a cuml issue and a raft PR (not an exaggeration, this kind of cross-repo discussion is fairly common without a centralized place for discussion). If an rmm issue is a must-have for you, could we compromise on keeping those issues as near-empty issues that just link to a build-planning issue? I realize that creating the issues on other repos isn't hard, especially with rapids-reviser, but that doesn't address the IMO much larger concern of fragmentation.
rapids_cython_create_modules()
populates a variableRAPIDS_CYTHON_CREATED_TARGETS
with a list of the targets it created (code link).
I'm fine with doing it this way for now, but I think that in the long run we should upstream the functionality. The purpose of RAPIDS_CYTHON_CREATED_TARGETS
is to allow projects to make modifications that are specific to that project, but the visibility changes we're proposing here are presumably something that is going to be generally desirable in the long run. It sounds like that's what @robertmaynard is proposing as well, so I'm good with moving this PR forward in parallel with the rapids-cmake changes.
If they are structs and not objects with vtables you are safe. The RTTI for hidden symbols can have the DSO embedded into it and therefore isn't safe across boundaries as you won't be able to
dynamic_cast
it the correct type. This primarily happens withlibc++
as it has more aggressive conformance around RTTI casting unlikelibstdc++
.
Hmm yeah I thought briefly about the vtables but didn't get far enough as to consider how they would be affected by crossing ABI boundaries. How is this made safe in general? Are vtables always embedded as unique symbols so that they can be deduplicated across DSOs? What about class methods? If rmm symbols corresponding to its classes are made public, and if one of those classes has some virtual function, could you end up with the same function defined in multiple DSOs (say libcuml and libcuvs) such that the two end up causing runtime lookup conflicts due to the symbols being identical?
@harrism could you please review one more time?
I significantly changed the approach after your first approval, don't want you to feel like I'm sneaking something in here.
Ok thanks for the reviews! I'm going to merge this so we can continue with https://github.com/rapidsai/cudf/pull/15483.
@
me any time if something related to this breaks.
/merge
Description
Fixes #1645
Contributes to https://github.com/rapidsai/build-planning/issues/33
Similar to https://github.com/rapidsai/cudf/pull/15982
Proposes more tightly controlling the visibility of symbols in the shared libraries produces for the
rmm
Python library, via the following:-fvisibility=hidden
by defaultrmm
(everything in thermm::
namespace) with__attribute__((visibility("default")))
Benefits of this change
Reduces the risk of symbol conflicts when
rmm
is used alongside other libraries. For example, see this case incudf
where thespdlog::
symbols inrmm
are conflicting with thespdlog::
symbols innvcomp
: https://github.com/rapidsai/cudf/pull/15483#discussion_r1670892743Reduces library size by a bit (around 0.3 MB uncompressed), by reducing the size of symbol tables in DSOs.
Notes for Reviewers
This is at the very edge of my C++ knowledge, apologies in advance if I've missed something obvious 😬
Checklist
How I tested this
Re-used the wheels produced from this PR over in
cudf
and sawcudf
's tests succeed where previously the had failed because of symbol conflicts: https://github.com/rapidsai/cudf/pull/15483#discussion_r1711964794Also tested this locally, read on...
On an x86_64 machine with CUDA 12.2, checked out the latest
24.10
ofrmm
and the branch ofcudf
from https://github.com/rapidsai/cudf/pull/15483 (which is up to date with the latest24.10
ofcudf
).Built
librmm
,rmm
,libcudf
, andcudf
wheels from source, in the same CUDA 12, Python 3.11 image used for wheel building CI.how I did that (click me)
Created a script, `build-all.sh`, with the following. ```shell #!/bin/bash set -e -u -o pipefail mkdir -p "${WHEEL_DIR}" rm -r ${WHEEL_DIR}/*.whl || true echo "" > "${PIP_CONSTRAINT}" rm -rf /usr/local/src/rmm/python/*/{build,dist,final_dist} rm -rf /usr/local/src/cudf/python/*/{build,dist,final_dist} # move sources to /tmp, so file permissions don't get messed up cp -R /usr/local/src/cudf /tmp/cudf cp -R /usr/local/src/rmm /tmp/rmm # install wheel python -m pip install wheel # build rmm C++ wheel echo "" echo "--- building rmm C++ wheel ---" echo "" cd /tmp/rmm/python/librmm python -m pip wheel . -w ./dist -vvv --no-deps --disable-pip-version-check python -m wheel tags --platform any ./dist/librmm*.whl --remove mv ./dist/librmm*.whl ${WHEEL_DIR}/ echo "" echo "--- done done building rmm C++ wheel ---" echo "" # build rmm Python wheel # ensure 'rmm' wheel builds always use the 'librmm' just built in the same CI run # # using env variable PIP_CONSTRAINT is necessary to ensure the constraints # are used when created the isolated build environment echo "librmm-cu12 @ file://$(echo ${WHEEL_DIR}/librmm_cu12*.whl)" >> "${PIP_CONSTRAINT}" echo "" echo "--- building rmm Python wheel ---" echo "" cd /tmp/rmm/python/rmm python -m pip wheel . -w ./dist -vvv --no-deps --disable-pip-version-check mkdir -p ./final_dist python -m auditwheel repair -w ./final_dist ./dist/rmm_*.whl mv ./final_dist/rmm_*.whl ${WHEEL_DIR}/ echo "rmm-cu12 @ file://$(echo ${WHEEL_DIR}/rmm_cu12*.whl)" >> "${PIP_CONSTRAINT}" echo "" echo "--- done building rmm Python wheel ---" echo "" # build cudf C++ wheel echo "" echo "--- building cudf C++ wheel ---" echo "" cd /tmp/cudf/python/libcudf mkdir -p ./final_dist python -m pip wheel . -w ./dist -vvv --no-deps --disable-pip-version-check python -m auditwheel repair \ --exclude libarrow.so.1601 \ -w ./final_dist \ ./dist/libcudf_*.whl mv ./final_dist/libcudf_*.whl ${WHEEL_DIR}/ echo "libcudf-cu12 @ file://$(echo ${WHEEL_DIR}/libcudf_cu12*.whl)" >> "${PIP_CONSTRAINT}" echo "" echo "--- done building cudf C++ wheel ---" echo "" # build cudf Python wheel echo "" echo "--- building cudf Python wheel ---" echo "" cd /tmp/cudf/python/cudf export SKBUILD_CMAKE_ARGS="-DUSE_LIBARROW_FROM_PYARROW=ON" mkdir -p ./final_dist python -m pip wheel . -w ./dist -vvv --no-deps --disable-pip-version-check python -m auditwheel repair \ --exclude libcudf.so \ --exclude libarrow.so.1601 \ -w ./final_dist \ ./dist/cudf_*.whl mv ./final_dist/cudf_*.whl ${WHEEL_DIR}/ echo "cudf-cu12 @ file://$(echo ${WHEEL_DIR}/cudf_cu12*.whl)" >> "${PIP_CONSTRAINT}" echo "" echo "--- done building cudf Python wheel ---" echo "" ``` Ran it like this: ```shell docker run \ --rm \ -v $(pwd):/usr/local/src \ -w /usr/local/src \ --env PIP_CONSTRAINT=/usr/local/src/build-constraints.txt \ --env WHEEL_DIR=/usr/local/src/dist \ -it rapidsai/ci-wheel:cuda12.5.1-rockylinux8-py3.11 \ bash # build all the wheels ./build-all.sh ```Installed the wheels produced by that script, using one of the testing images used across RAPIDS CI.
how I did that (click me)
```shell docker run \ --rm \ --gpus 1 \ --env WHEEL_DIR=/usr/local/src/dist \ -v $(pwd):/usr/local/src \ -w /usr/local/src \ -it rapidsai/ci-conda:cuda12.2.2-ubuntu22.04-py3.11 \ bash # install everything pip install ${WHEEL_DIR}/*.whl ```Saw that fail with exactly the same symbol-conflict issue discussed in https://github.com/rapidsai/cudf/pull/15483#discussion_r1670892743..
Repeated that process with the changes you see in this PR, and observed:
rmm
's Python tests passingcudf
😁Also inspected the symbol tables with
readelf
. For example, here are the counts of symbols by visibility inmemory_resource.cpython-311-x86_64-linux-gnu.so
:It also helped reduce the sizes a bit
how I did that (click me)
Unzipped the `rmm` wheel. ```shell rm -rf ./delete-me mkdir -p ./delete-me unzip -d ./delete-me ./dist/rmm_cu12*.whl ``` Checked the symbol tables with `readelf` ```shell readelf --symbols --wide \ ./delete-me/rmm/_lib/memory_resource.cpython-311-x86_64-linux-gnu.so \ | c++filt \ > syms.txt echo "GLOBAL: $(grep --count -E ' GLOBAL ' < ./syms.txt)" echo "WEAK: $(grep --count -E ' WEAK ' < ./syms.txt)" echo "LOCAL: $(grep --count -E ' LOCAL ' < ./syms.txt)" ``` Checked the sizes with `pydistcheck` ```shell # and to get the sizes pip install pydistcheck pydistcheck --inspect ./dist/rmm_cu12*.whl ```Where did
RMM_EXPORT
come from?It's been in
rmm
for a few years:https://github.com/rapidsai/rmm/blob/975c911ad59219e98b8aed1e9959120a2c877d06/include/rmm/detail/export.hpp#L19-L26
But only used in one place, from #833.
References
I found these helpful: