google / tensorstore

Library for reading and writing large multi-dimensional arrays.
https://google.github.io/tensorstore/
Other
1.34k stars 120 forks source link

Tensorstore C++ Integration: How to Avoid Driver Registry Conflict #191

Open sameeul opened 3 weeks ago

sameeul commented 3 weeks ago

We have two Python packages (bfiocpp and argolid) where we use the C++ API and follow the CMake integration instruction given in the documentation.

Both of the packages individually work without any issue. However, we noticed that when both the packages are present in the same environment, if we try to import both of them, we get the following error:

python: /home/samee/axle_dev/argolid/build/temp.linux-x86_64-cpython-311/_deps/absl-src/absl/container/internal
/raw_hash_set.h:3154: void absl::lts_20240116::container_internal::raw_hash_set<Policy, Hash, Eq, Alloc>::
emplace_at(size_t, Args&& ...) [with Args = {const tensorstore::serialization::Registry::Entry*}; Policy = 
absl::lts_20240116::container_internal::FlatHashSetPolicy<const tensorstore::serialization::Registry::Entry*>; Hash = 
tensorstore::internal::SupportsHeterogeneous<absl::lts_20240116::hash_internal::Hash<tensorstore::internal::KeyAdapter<const 
tensorstore::serialization::Registry::Entry*, std::basic_string_view<char>, &tensorstore::serialization::Registry::Entry::id> > >;
 Eq = tensorstore::internal::SupportsHeterogeneous<std::equal_to<tensorstore::internal::KeyAdapter<const 
tensorstore::serialization::Registry::Entry*, std::basic_string_view<char>, &tensorstore::serialization::Registry::Entry::id> > >; Alloc = 
std::allocator<const tensorstore::serialization::Registry::Entry*>; size_t = long unsigned int]: Assertion 
`PolicyTraits::apply(FindElement{*this}, *iterator_at(i)) == iterator_at(i) && "constructed value does not match the lookup key"' failed.

I am using v0.1.60. I would appreciate if you can shed some light on why such driver registry works on global level instead of per-package level.

laramiel commented 1 week ago

Both argolid and bfiocpp use very old versions of tensorstore; have you updated them?

Anyway, without looking at your package files, it seems likely that two versions of tensorstore are being built and linked into your project. There may be order problems with your CMake file.

It would be nice to provide a self-contained repro to understand better what's going on.

sameeul commented 1 week ago

I have also tried with the newer tensorstore release (v1.63). Finally, explicitly modifying the symbol visibility in the CMake files (at the project level) seem to resolve the issue.

 set(CMAKE_CXX_VISIBILITY_PRESET hidden)
 set(CMAKE_VISIBILITY_INLINES_HIDDEN 1)
laramiel commented 1 week ago

I may need to add something like that to our CMake generation.

jbms commented 1 week ago

It appears that at least some tensorstore symbols have somehow ended up with global visibility across the two extension modules. Symbol visibility for Python extension modules is tricky. Normally you want all symbols defined by the extension module to be private. Python in fact loads extension modules with RTLD_LOCAL by default, which means that symbols should be private.

Separately, you can build with -fvisibility=hidden for better performance and to ensure symbols are private even if loaded with RTLD_GLOBAL. The tensorstore python package that we provide is built this way.

You may want to still figure out why the global symbol visibility issue came up in the first place. Is tensorstore statically linked into your C extension module or is there an intermediate dynamic library by any chance?

laramiel commented 1 week ago

I think that this is during the C++ build. There could be two versions of tensorstore included in the build, and/or there a difference between the equivalent bazel build flags and the cmake build flags and/or bazel_to_cmake failed to generate the linker script here?

laramiel commented 1 week ago

Is there a github repository or some other reproducing example you can provide where this happens?