google / tensorstore

Library for reading and writing large multi-dimensional arrays.
https://google.github.io/tensorstore/
Other
1.35k stars 120 forks source link

Support installation to enable reuse/packaging #190

Open nelsonihc opened 2 months ago

nelsonihc commented 2 months ago

Hi, looking into old issues regarding cmake support, I've noticed that a installation target was originally supported as:

cmake .. -DTENSORSTORE_ENABLE_INSTALL:BOOL=true -DCMAKE_INSTALL_PREFIX=/usr/local

But this changed around version v0.1.23.

Supporting installation through cmake enables reuse of compilation between users in large facilities avoiding long configure/compile step times. By enabling installation you will likely also enable packaging, this make possible to use the project with packagers like spack, vcpkg, conan and OS packaging systems. Packaging greatly reduces the fricction to use the library for less specialized users.

Could you bring back this capability?

laramiel commented 2 months ago

We may need to look at that again. It was disabled it when our auto-conversion between bazel and CMake became complicated by the inclusion of abseil, grpc, upb, protobuf, etc.

If you poke around https://github.com/google/tensorstore/tree/master/tools/cmake/bazel_to_cmake we'd need workarounds for installing those libraries.

nelsonihc commented 2 months ago

by any chance is there a way to peek into a expanded version of the "cmakelists.txt" file? or a way to output it to a file for analysis?

laramiel commented 2 months ago

Oh, yes. If you run the CMake command look at the build directory. For example, if I have:

git clone https://github.com/google/tensorstore.git
mkdir _build
cd _build
cmake -G Ninja ../tensorstore

Then it will have a bunch of subdirectories under _deps. The generated files are, modulo some special cases, structured like this:

When CMake -G Ninja ../tensorstore is called, the ../tensorstore/CMakeLists.txt runs bazel_to_cmake, which generates the top-level build_rules.cmake, then trampolines into running the generated rules.

The generated rules load all the third_party dependencies, potentially replacing their native CMakeLists.txt with a proxy file which invokes bazel_to_cmake. There are a bunch of per-package configuration options in the third_party/<package>/workspace.bzl files which controls the behavior, which is a bit complicated. To use the native CMakeLists.txt files in the dependencies there usually needs to be a mapping between the bazel target names and the cmake target names.

baze_to_cmake essentially parses and evaluates the bazel scripts (BUILD rules), and the build_rules.cmake are directly generated based on the bazel commands, with a few big caveats. The biggest: It doesn't handle bazel aspects, since it lacks a global view of all dependencies. This is problematic in the case of protobuf, grpc, etc. Also there is some state pickling happening across runs to propagate dependencies downstream.

If you look at https://github.com/google/tensorstore/tree/master/tools/cmake/bazel_to_cmake there are some golden generated files that you should be able to run pytest/blaze test against.

jbms commented 1 month ago

Potentially local installation could be supported but there are a lot of challenges given that it would require also "installing" all of the dependencies, and furthermore would require fixing a single set of compilation flags. Tensorstore depends on abseil which currently does not guarantee ABI compatibility across C++ standard versions, I believe.

An alternative to consider would be sccache --- we use that for our own CI builds with cmake it works quite well.

nelsonihc commented 1 month ago

I'm thinking about a simpler scope for this problem taking into account the feedback given here: 1) In cases where the user manually installs external dependencies and uses the TENSORSTORE_USESYSTEM flags, would it be possible to create an installable version of Tensorstore? 2) Since I'm thinking in the static version for the library, another possibility would be creating a bundled .a with all libs merged. Apache Arrow/Parquet does this and seems to work well. 3) I've no experience with bazel, but would it be easier to make an installable version with bazel (with no cmake port)? as long this installation creates a compliant pkg-config (.pc) file with the rules I would be able to consume it from cmake.