Hide visibility of non-public symbols

jameslamb commented 1 month ago

Description

Fixes #1645

Contributes to https://github.com/rapidsai/build-planning/issues/33

Proposes more tightly controlling the visibility of symbols in the shared libraries produces for the rmm Python library, via the following:

compiling with -fvisibility=hidden by default
marking intended-to-be-public parts of rmm (everything in the rmm:: namespace) with __attribute__((visibility("default")))

Benefits of this change

Reduces the risk of symbol conflicts when rmm is used alongside other libraries. For example, see this case in cudf where the spdlog:: symbols in rmm are conflicting with the spdlog:: symbols in nvcomp: https://github.com/rapidsai/cudf/pull/15483#discussion_r1670892743

Reduces library size by a bit (around 0.3 MB uncompressed), by reducing the size of symbol tables in DSOs.

Notes for Reviewers

This is at the very edge of my C++ knowledge, apologies in advance if I've missed something obvious 😬

Checklist

[x] I am familiar with the Contributing Guidelines.
[x] New or existing tests cover these changes.
[x] The documentation is up to date with these changes.

How I tested this

Re-used the wheels produced from this PR over in cudf and saw cudf's tests succeed where previously the had failed because of symbol conflicts: https://github.com/rapidsai/cudf/pull/15483#discussion_r1711964794

Also tested this locally, read on...

On an x86_64 machine with CUDA 12.2, checked out the latest 24.10 of rmm and the branch of cudf from https://github.com/rapidsai/cudf/pull/15483 (which is up to date with the latest 24.10 of cudf).

Built librmm, rmm, libcudf, and cudf wheels from source, in the same CUDA 12, Python 3.11 image used for wheel building CI.

how I did that (click me)

Created a script, `build-all.sh`, with the following. ```shell #!/bin/bash set -e -u -o pipefail mkdir -p "${WHEEL_DIR}" rm -r ${WHEEL_DIR}/*.whl || true echo "" > "${PIP_CONSTRAINT}" rm -rf /usr/local/src/rmm/python/*/{build,dist,final_dist} rm -rf /usr/local/src/cudf/python/*/{build,dist,final_dist} # move sources to /tmp, so file permissions don't get messed up cp -R /usr/local/src/cudf /tmp/cudf cp -R /usr/local/src/rmm /tmp/rmm # install wheel python -m pip install wheel # build rmm C++ wheel echo "" echo "--- building rmm C++ wheel ---" echo "" cd /tmp/rmm/python/librmm python -m pip wheel . -w ./dist -vvv --no-deps --disable-pip-version-check python -m wheel tags --platform any ./dist/librmm*.whl --remove mv ./dist/librmm*.whl ${WHEEL_DIR}/ echo "" echo "--- done done building rmm C++ wheel ---" echo "" # build rmm Python wheel # ensure 'rmm' wheel builds always use the 'librmm' just built in the same CI run # # using env variable PIP_CONSTRAINT is necessary to ensure the constraints # are used when created the isolated build environment echo "librmm-cu12 @ file://$(echo ${WHEEL_DIR}/librmm_cu12*.whl)" >> "${PIP_CONSTRAINT}" echo "" echo "--- building rmm Python wheel ---" echo "" cd /tmp/rmm/python/rmm python -m pip wheel . -w ./dist -vvv --no-deps --disable-pip-version-check mkdir -p ./final_dist python -m auditwheel repair -w ./final_dist ./dist/rmm_*.whl mv ./final_dist/rmm_*.whl ${WHEEL_DIR}/ echo "rmm-cu12 @ file://$(echo ${WHEEL_DIR}/rmm_cu12*.whl)" >> "${PIP_CONSTRAINT}" echo "" echo "--- done building rmm Python wheel ---" echo "" # build cudf C++ wheel echo "" echo "--- building cudf C++ wheel ---" echo "" cd /tmp/cudf/python/libcudf mkdir -p ./final_dist python -m pip wheel . -w ./dist -vvv --no-deps --disable-pip-version-check python -m auditwheel repair \ --exclude libarrow.so.1601 \ -w ./final_dist \ ./dist/libcudf_*.whl mv ./final_dist/libcudf_*.whl ${WHEEL_DIR}/ echo "libcudf-cu12 @ file://$(echo ${WHEEL_DIR}/libcudf_cu12*.whl)" >> "${PIP_CONSTRAINT}" echo "" echo "--- done building cudf C++ wheel ---" echo "" # build cudf Python wheel echo "" echo "--- building cudf Python wheel ---" echo "" cd /tmp/cudf/python/cudf export SKBUILD_CMAKE_ARGS="-DUSE_LIBARROW_FROM_PYARROW=ON" mkdir -p ./final_dist python -m pip wheel . -w ./dist -vvv --no-deps --disable-pip-version-check python -m auditwheel repair \ --exclude libcudf.so \ --exclude libarrow.so.1601 \ -w ./final_dist \ ./dist/cudf_*.whl mv ./final_dist/cudf_*.whl ${WHEEL_DIR}/ echo "cudf-cu12 @ file://$(echo ${WHEEL_DIR}/cudf_cu12*.whl)" >> "${PIP_CONSTRAINT}" echo "" echo "--- done building cudf Python wheel ---" echo "" ``` Ran it like this: ```shell docker run \ --rm \ -v $(pwd):/usr/local/src \ -w /usr/local/src \ --env PIP_CONSTRAINT=/usr/local/src/build-constraints.txt \ --env WHEEL_DIR=/usr/local/src/dist \ -it rapidsai/ci-wheel:cuda12.5.1-rockylinux8-py3.11 \ bash # build all the wheels ./build-all.sh ```

Installed the wheels produced by that script, using one of the testing images used across RAPIDS CI.

how I did that (click me)

```shell docker run \ --rm \ --gpus 1 \ --env WHEEL_DIR=/usr/local/src/dist \ -v $(pwd):/usr/local/src \ -w /usr/local/src \ -it rapidsai/ci-conda:cuda12.2.2-ubuntu22.04-py3.11 \ bash # install everything pip install ${WHEEL_DIR}/*.whl ```

# run rmm's Python tests
pip install pytest
cd /usr/local/src/rmm
ci/run_pytests.sh

# try to reproduce the bug
cat > ./test2.py <<EOF
import cudf.pandas
cudf.pandas.install()
EOF

CUDF_PANDAS_RMM_MODE=pool python ./test2.py

Saw that fail with exactly the same symbol-conflict issue discussed in https://github.com/rapidsai/cudf/pull/15483#discussion_r1670892743..

Repeated that process with the changes you see in this PR, and observed:

all of rmm's Python tests passing
no more symbol conflict issue in cudf 😁

Also inspected the symbol tables with readelf. For example, here are the counts of symbols by visibility in memory_resource.cpython-311-x86_64-linux-gnu.so:

# branch-24.10
GLOBAL: 431
WEAK: 1220
LOCAL: 1

# this PR
GLOBAL: 425
WEAK: 75
LOCAL: 1

It also helped reduce the sizes a bit

# (these are the uncompressed sizes)

# branch-24.10
memory_resource.*.so : 1.5  MB
rmm_cu12*.whl: 5.4 MB

# this PR
memory_resource.*.so : 1.3  MB
rmm_cu12*.whl: 5.1 MB

how I did that (click me)

Unzipped the `rmm` wheel. ```shell rm -rf ./delete-me mkdir -p ./delete-me unzip -d ./delete-me ./dist/rmm_cu12*.whl ``` Checked the symbol tables with `readelf` ```shell readelf --symbols --wide \ ./delete-me/rmm/_lib/memory_resource.cpython-311-x86_64-linux-gnu.so \ | c++filt \ > syms.txt echo "GLOBAL: $(grep --count -E ' GLOBAL ' < ./syms.txt)" echo "WEAK: $(grep --count -E ' WEAK ' < ./syms.txt)" echo "LOCAL: $(grep --count -E ' LOCAL ' < ./syms.txt)" ``` Checked the sizes with `pydistcheck` ```shell # and to get the sizes pip install pydistcheck pydistcheck --inspect ./dist/rmm_cu12*.whl ```

Where did `RMM_EXPORT` come from?

It's been in rmm for a few years:

https://github.com/rapidsai/rmm/blob/975c911ad59219e98b8aed1e9959120a2c877d06/include/rmm/detail/export.hpp#L19-L26

But only used in one place, from #833.

References

I found these helpful:

jameslamb commented 1 month ago

If you removed the RMM_EXPORT from a header, would the rmm Python builds or tests fail?

Good question!

I just tried removing the RMM_EXPORT from include/rmm/cuda_stream.hpp and repeated the testing from the PR description... building succeeded, ci/run_pytests.sh succeeded.

Then tried removing ALL of the RMM_EXPORT macros. Same thing... building succeeded, ci/run_pytests.sh succeeded.

Maybe this is because the Python package only calls into the DSOs through public symbols generated by Cython, and those are not affected by this CMake change?

I'm going to try pushing a commit with those removals here, let's see what other CI jobs say.

bdice commented 1 month ago

My hope is that we can test the accuracy of our exports within RMM's CI rather than relying on cuDF to tell RMM that it broke.

jameslamb commented 1 month ago

My hope is that we can test the accuracy of our exports within RMM's CI rather than relying on cuDF to tell RMM that it broke.

Right, totally understand and I agree.

Unfortunately it looks like even removing those, everything passed: https://github.com/rapidsai/rmm/actions/runs/10326031645?pr=1644. So I just pushed https://github.com/rapidsai/rmm/pull/1644/commits/1c39756a6c26b451aa84579cc5ecaf9ccc6eae5c reverting this PR back to its original state.

Something about CI will have to change here if we want feedback on the exports working correctly within rmm's CI.

This is way at the edge of my understanding, so the only thing I can think of is to dump the symbol table for every DSO and pass it through some text processing (e.g. we shouldn't see any WEAK / GLOBAL spdlog:: symbols remaining in any DSOs, and I can envision how to check that with grep + the output of readelf --symbols).

I'm certain someone else will have a better idea than that for how to express the behavior we want in CI here. cc @robertmaynard @KyleFromNVIDIA

harrism commented 1 month ago

Can you please open an RMM issue for this? Thanks.

harrism commented 1 month ago

As for testing, could add a separate test that links spdlog and RMM and make sure it compiles without symbol conflicts?

robertmaynard commented 1 month ago

Rather than setting global variables in CMakeLists.txt, we should set targets on properties inside rapids-cmake. You want properties like this one. rapids_cython_create_modules knows whether a module is being compiled for C or C++, so nthe appropriate property can be set on a per-target basis.

I agree that we should use a per-target approach to setting the visibility. I would prefer if we use an explicit approach so that we don't effect other callers of rapids_cython_create_modules.

It isn't clear to me what kind of C++ types cross shared libraries boundaries via Python. If we have C++ objects being shunted via Python between rmm and cudf, or rmm and cuml we will need the RMM symbols to be public ( which they currently are without this patch ) so that we have consistent types across multiple libraries.

jameslamb commented 1 month ago

Can you please open an RMM issue for this? Thanks

@harrism sure! I just opened https://github.com/rapidsai/rmm/issues/1645 and updated the description of this PR to link to it.

vyasr commented 1 month ago

Can you please open an RMM issue for this? Thanks.

Is there a reason https://github.com/rapidsai/build-planning/issues/33 isn't sufficient? The purpose of the build-planning repo is for precisely this sort of cross-repo task tracking that is easier to discuss in a centralized location.

vyasr commented 1 month ago

I agree that we should use a per-target approach to setting the visibility. I would prefer if we use an explicit approach so that we don't effect other callers of rapids_cython_create_modules.

That works, so instead of changing the default behavior, we'd add an option DEFAULT_VISIBILITY or so and if it isn't set then don't specify any property on the target.

It isn't clear to me what kind of C++ types cross shared libraries boundaries via Python. If we have C++ objects being shunted via Python between rmm and cudf, or rmm and cuml we will need the RMM symbols to be public ( which they currently are without this patch ) so that we have consistent types across multiple libraries.

We definitely need to be able to hand around e.g. rmm device buffers across library boundaries. I'm not sure how that's reflected in terms of structs (classes) at the DSO level, though. Does visibility have any impact on whether they're safe to hand around? Structs don't actually have any corresponding symbols, so there's nothing to check there. As long as they are ABI-stable between the different libraries, isn't that sufficient? Basic library inspection tools won't even give us enough information to study the stability of the objects since we can't check on things like member ordering/alignment or vtable contents.

jameslamb commented 1 month ago

That works, so instead of changing the default behavior, we'd add an option DEFAULT_VISIBILITY or so and if it isn't set then don't specify any property on the target.

I have a slightly different proposal that'd be in the spirit of this without requiring visibility-specific arguments in the signature of rapids_cython_create_modules(). Had been waiting to push it until I'd tested locally. But that local testing looked ok, so I just pushed some commits with it.

rapids_cython_create_modules() populates a variable RAPIDS_CYTHON_CREATED_TARGETS with a list of the targets it created (code link).

So without any change changes to rapids-cmake, it's possible to do this in a target-centered way specifically for the rmm Cython stuff.

Like this...

foreach(_cython_target IN LISTS RAPIDS_CYTHON_CREATED_TARGETS)
    set_target_properties(
        ${_cython_target}
        PROPERTIES
            C_VISIBILITY_PRESET hidden
            CXX_VISIBILITY_PRESET hidden
    )
endforeach()

... after each call to rapids_cython_create_modules() .

Doing it that way here (and in other similar projects) would allow us to test this behavior without expanding rapids-cmake's public API. So if it turns out that this is wrong or insufficient, it could be reverted without breaking changes to rapids-cmake.

harrism commented 1 month ago

Can you please open an RMM issue for this? Thanks.

Is there a reason https://github.com/rapidsai/build-planning/issues/33 isn't sufficient? The purpose of the build-planning repo is for precisely this sort of cross-repo task tracking that is easier to discuss in a centralized location.

Yes. It doesn't show up in the issues for an RMM release. And I don't get a notification showing me the work needed/being done on the library. I choose to follow @jarmak-nv 's suggested process of having an issue for all work done on RMM. It helps me track, plan, and report.

robertmaynard commented 1 month ago

We definitely need to be able to hand around e.g. rmm device buffers across library boundaries. I'm not sure how that's reflected in terms of structs (classes) at the DSO level, though. Does visibility have any impact on whether they're safe to hand around? Structs don't actually have any corresponding symbols, so there's nothing to check there. As long as they are ABI-stable between the different libraries, isn't that sufficient?

If they are structs and not objects with vtables you are safe. The RTTI for hidden symbols can have the DSO embedded into it and therefore isn't safe across boundaries as you won't be able to dynamic_cast it the correct type. This primarily happens with libc++ as it has more aggressive conformance around RTTI casting unlike libstdc++.

jameslamb commented 1 month ago

I just tested the cudf C++ wheels PR using artifacts produced from CI here and confirmed that this set of changes also resolves the symbol conflict issues.

ref: https://github.com/rapidsai/cudf/pull/15483#discussion_r1715502200

I've also proposed a new testing script implementing the idea from discussion above to try to add a CI-time check on symbol visibility. It shows up like this in wheel build logs:

checking exported symbols in '/tmp/tmp.6uVuClfluU/rmm/_cuda/stream.cpython-39-x86_64-linux-gnu.so'
symbol counts by type
  * GLOBAL: 305
  * WEAK: 5
  * LOCAL: 1
checking for 'fmt::' symbols...
checking for 'spdlog::' symbols...
No symbol visibility issues found

checking exported symbols in '/tmp/tmp.6uVuClfluU/rmm/_lib/memory_resource.cpython-39-x86_64-linux-gnu.so'
symbol counts by type
  * GLOBAL: 420
  * WEAK: 28
  * LOCAL: 1
checking for 'fmt::' symbols...
checking for 'spdlog::' symbols...
No symbol visibility issues found
...

(build link)

Tested that locally and found that it did successfully fail when I omitted the visibility attributes from some targets. I've hard-coded fmt:: and spdlog:: specifically into there... not sure if there's a more generic way to write that. But this is an improvement over the current state and at least offers some protection against the specific source of conflicts this PR sought to address.

I think this is ready for another review.

robertmaynard commented 1 month ago

That works, so instead of changing the default behavior, we'd add an option DEFAULT_VISIBILITY or so and if it isn't set then don't specify any property on the target.

I have a slightly different proposal that'd be in the spirit of this without requiring visibility-specific arguments in the signature of rapids_cython_create_modules(). Had been waiting to push it until I'd tested locally. But that local testing looked ok, so I just pushed some commits with it.

rapids_cython_create_modules() populates a variable RAPIDS_CYTHON_CREATED_TARGETS with a list of the targets it created (code link).

So without any change changes to rapids-cmake, it's possible to do this in a target-centered way specifically for the rmm Cython stuff.

Like this...
foreach(_cython_target IN LISTS RAPIDS_CYTHON_CREATED_TARGETS)
    set_target_properties(
        ${_cython_target}
        PROPERTIES
            C_VISIBILITY_PRESET hidden
            CXX_VISIBILITY_PRESET hidden
    )
endforeach()
... after each call to rapids_cython_create_modules() .

Doing it that way here (and in other similar projects) would allow us to test this behavior without expanding rapids-cmake's public API. So if it turns out that this is wrong or insufficient, it could be reverted without breaking changes to rapids-cmake.

@jameslamb Could you open a rapids-cmake issue to track this feature request? I happy with using the current foreach approach in rmm for now so we can get the libcudf wheel PR merged and we can iterate on a full design of the API in parallel/after.

jameslamb commented 1 month ago

Could you open a rapids-cmake issue to track this feature request?

Sure! here you go: https://github.com/rapidsai/rapids-cmake/issues/672

happy with using the current foreach approach in rmm for now so we can get the libcudf wheel PR merged and we can iterate on a full design of the API in parallel/after.

Thanks! I agree.

vyasr commented 1 month ago

Yes. It doesn't show up in the issues for an RMM release. And I don't get a notification showing me the work needed/being done on the library. I choose to follow @jarmak-nv 's suggested process of having an issue for all work done on RMM. It helps me track, plan, and report.

When we created build-planning we (including you and @jarmak-nv) specifically discussed this use case as one where the fragmentation associated with creating per-repo issues wasn't worth the cost. Without a centralized point of discussion (e.g. a build-planning issue) we end up with a comment on a cugraph issue pointing to a solution discussed in a combination of posts on a cuml issue and a raft PR (not an exaggeration, this kind of cross-repo discussion is fairly common without a centralized place for discussion). If an rmm issue is a must-have for you, could we compromise on keeping those issues as near-empty issues that just link to a build-planning issue? I realize that creating the issues on other repos isn't hard, especially with rapids-reviser, but that doesn't address the IMO much larger concern of fragmentation.

vyasr commented 1 month ago

rapids_cython_create_modules() populates a variable RAPIDS_CYTHON_CREATED_TARGETS with a list of the targets it created (code link).

I'm fine with doing it this way for now, but I think that in the long run we should upstream the functionality. The purpose of RAPIDS_CYTHON_CREATED_TARGETS is to allow projects to make modifications that are specific to that project, but the visibility changes we're proposing here are presumably something that is going to be generally desirable in the long run. It sounds like that's what @robertmaynard is proposing as well, so I'm good with moving this PR forward in parallel with the rapids-cmake changes.

If they are structs and not objects with vtables you are safe. The RTTI for hidden symbols can have the DSO embedded into it and therefore isn't safe across boundaries as you won't be able to dynamic_cast it the correct type. This primarily happens with libc++ as it has more aggressive conformance around RTTI casting unlike libstdc++.

Hmm yeah I thought briefly about the vtables but didn't get far enough as to consider how they would be affected by crossing ABI boundaries. How is this made safe in general? Are vtables always embedded as unique symbols so that they can be deduplicated across DSOs? What about class methods? If rmm symbols corresponding to its classes are made public, and if one of those classes has some virtual function, could you end up with the same function defined in multiple DSOs (say libcuml and libcuvs) such that the two end up causing runtime lookup conflicts due to the symbols being identical?

jameslamb commented 1 month ago

@harrism could you please review one more time?

I significantly changed the approach after your first approval, don't want you to feel like I'm sneaking something in here.

jameslamb commented 1 month ago

Ok thanks for the reviews! I'm going to merge this so we can continue with https://github.com/rapidsai/cudf/pull/15483.

@ me any time if something related to this breaks.

jameslamb commented 1 month ago

/merge

rapidsai / rmm