google / tensorstore

Library for reading and writing large multi-dimensional arrays.
https://google.github.io/tensorstore/
Other
1.35k stars 120 forks source link

CMake support #29

Closed dzenanz closed 2 years ago

dzenanz commented 2 years ago

CMake support will be added in the future.

This was written more than 2 years ago. The future is here, where is CMake support? 😄

Seriously, is there some plan or a timeline?

jbms commented 2 years ago

No one specifically asked for it previously so it wasn't a priority.

But it sounds like there is some interest now so we will try to prioritize.

Probably we will need to create a system to auto-convert the Bazel build rules into CMake rules, which should be fairly straightforward. The hardest part will be dealing with getting all of the dependencies available through cmake as well.

laramiel commented 2 years ago

Can you tell me more about why cmake support is useful to you? It's a pretty large step to take given the current dependency set.

dzenanz commented 2 years ago

I want to add Zarr support to https://github.com/InsightSoftwareConsortium/ITK. ITK's build system is CMake-based, so a library needs to be CMake-ified in order for us to use it. I already got started, relying on xtensor-zarr which has CMake-based build system. But it has out-of-date documentation, and many other outstanding issues.

blasscoc commented 2 years ago

Thanks for contributing the tensorstore!

My use cases for cmake are one, I want to be able to just compile tensorstore and use it as a library with other projects that are already using cmake. And a far second, I haven't been able to get the intellisense and debuggers working with Bazel in VSCode. I haven't used Bazel before so it could be these things are easy to do.

You right though, supporting a whole new build system does seem like a pretty big lift. Is there anyway I could help out if you decided to do that?

2bndy5 commented 2 years ago

Maybe this repo could serve as a starting point.

laramiel commented 2 years ago

Sure, I'm aware of @haberman's repo there. I have actually started doing something based on very similar concepts, so a preliminary, incomplete CMake framework should be visible whenever we do the next export. The main issues that I've run into so far are:

So if you would like to help this happen, look at the dependencies under third_party and see which ones don't have CMake support; that's what has to happen next.

2bndy5 commented 2 years ago

CMake is really not a good language

This seems biased, so I'll disregard. I do sympathize with your frustration in porting to a different build system though. That said, I know CMake (syntax seems similar to Lua which is not ideal for OOP) far better than I understand how to use Bazel (which is not at all). I think an argument could be made about python's execution speed (used in Bazel) vs CMake's native execution speed, but this is getting off-topic...

tensorstore has a lot of dependencies that are not CMake visible

My experience with CMake is that their find_library() is quite versatile. It can be made to look in certain locations (for instance if the lib was installed in non-default locations - other than /usr/local/lib).

Some scenarios may be satisfied by CMake-provided "Find Modules" shipped with CMake (be mindful of the CMake version specified). Typically, if acquiring a dependency is not satisfied by these builtin modules, then it is pretty common for software devs to write their own "Find Module" (see the tutorial here). I understand how this convention is undesirable here since you prefer to only support the Bazel system, but if the find module(s) is written well, then it could be a set-and-forget tactic.


I don't really have a use case for this lib; I landed here out of curiosity via reference from \@jbms. I just thought I could pass along what I've learned from working with CMake/pybind11 in the past. Please don't take offense if I don't volunteer contributions. I'm staying subscribed to this issue so my comments won't be "drive-by" observations.

laramiel commented 2 years ago

Yes, my opinion that cmake is not a good language is totally subjective. But that's my experience so far. And a lot of other folks share it: https://news.ycombinator.com/item?id=25701041 But it has become a key integration point, however I view it.

I've been trying to use the newer functionality of FetchContent and friends, which also maintains a somewhat hermetic build when coupled with our existing bazel dependency downloading, but the sticking point right now is dependencies that are not exposed via cmake. Thanks for the recommendations, I may need to use them at some point.

2bndy5 commented 2 years ago

Sorry, I didn't realize that Bazel was downloading deps. That seems like an alternative to using git submodules, but it is better for pip installing a C-ext from sdist. Thanks for that FetchContent lead.

laramiel commented 2 years ago

I've added preliminary CMake generation; it's not ready yet, but anyone is free to play with it and maybe update it:

  cd /path/to/tensorstore
  python3 CMake/bazel_to_cmake.py
  mkdir build/
  cd build
  cmake ..

NOTE: This will not yet build tensorstore.

dzenanz commented 2 years ago

I am starting a vacation + conference trip tomorrow, so I will not be able to devote significant effort to this until July.

How important is requirement of C++17 standard? In ITK, we are just about to release a version after switching from C++11 to C++14.

Trying cmake configure yields:

Selecting Windows SDK version 10.0.19041.0 to target Windows 10.0.19044.
The CXX compiler identification is MSVC 19.29.30142.1
Detecting CXX compiler ABI info
Detecting CXX compiler ABI info - done
Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe - skipped
Detecting CXX compile features
Detecting CXX compile features - done
Looking for C++ include pthread.h
Looking for C++ include pthread.h - not found
Found Threads: TRUE  
Could NOT find GTest (missing: GTEST_LIBRARY GTEST_INCLUDE_DIR GTEST_MAIN_LIBRARY) 
The C compiler identification is MSVC 19.29.30142.1
Detecting C compiler ABI info
Detecting C compiler ABI info - done
Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe - skipped
Detecting C compile features
Detecting C compile features - done
Found Python: C:/Program Files/Python39/python.exe (found version "3.9.6") found components: Interpreter 
pybind11 v2.8.1 
Found PythonInterp: C:/Program Files/Python39/python.exe (found version "3.9.6") 
Found PythonLibs: C:/Program Files/Python39/libs/python39.lib
Performing Test HAS_MSVC_GL_LTCG
Performing Test HAS_MSVC_GL_LTCG - Success
Found Git: C:/Program Files/Git/cmd/git.exe (found version "2.35.2.windows.1") 
git version: v0.0.0 normalized to 0.0.0
Version: 1.6.0
Performing Test HAVE_CXX_FLAG_EHS_
Performing Test HAVE_CXX_FLAG_EHS_ - Success
Performing Test HAVE_CXX_FLAG_EHA_
Performing Test HAVE_CXX_FLAG_EHA_ - Success
Performing Test HAVE_STD_REGEX
Performing Test HAVE_STD_REGEX
Performing Test HAVE_STD_REGEX -- success
Performing Test HAVE_GNU_POSIX_REGEX
Performing Test HAVE_GNU_POSIX_REGEX
Performing Test HAVE_GNU_POSIX_REGEX -- failed to compile
Performing Test HAVE_POSIX_REGEX
Performing Test HAVE_POSIX_REGEX
Performing Test HAVE_POSIX_REGEX -- failed to compile
CMake Warning at C:/Misc/tensorstore-vs19/_deps/com_google_benchmark-src/CMakeLists.txt:287 (message):
  Using std::regex with exceptions disabled is not fully supported

Performing Test HAVE_STEADY_CLOCK
Performing Test HAVE_STEADY_CLOCK
Performing Test HAVE_STEADY_CLOCK -- success
The ASM_NASM compiler identification is unknown
Didn't find assembler
CMake Error at C:/Misc/tensorstore-vs19/_deps/com_google_boringssl-src/CMakeLists.txt:115 (enable_language):
  No CMAKE_ASM_NASM_COMPILER could be found.

Configuring incomplete, errors occurred!
See also "C:/Misc/tensorstore-vs19/CMakeFiles/CMakeOutput.log".
See also "C:/Misc/tensorstore-vs19/CMakeFiles/CMakeError.log".

Notably:

The ASM_NASM compiler identification is unknown
Didn't find assembler
CMake Error at C:/Misc/tensorstore-vs19/_deps/com_google_boringssl-src/CMakeLists.txt:115 (enable_language):
  No CMAKE_ASM_NASM_COMPILER could be found.

This means I can't build it on my main machine yet.

jbms commented 2 years ago

C++17 is definitely required.

I believe the cmake support is still a work in progress and not yet expected to actually work.

dzenanz commented 2 years ago

C++17 is definitely required.

One more reason for us to move to C++17 sooner, rather than later. As we plan for Zarr/NGFF IO implementation to live in a repository separate from the main one, we can implicitly require C++17 only if Zarr/NGFF is enabled.

dzenanz commented 2 years ago

I remember having trouble building BoringSSL a few years ago as part of GRPC. I eventually got it, but it wasn't quite so boring 😄

laramiel commented 2 years ago

Echoing jbms@, this does not yet build tensorstore, particularly since some of the dependencies don't yet work with CMake.

Also, the CMake build will require NASM, among other things, and it will not install NASM.

2bndy5 commented 2 years ago

Not sure if will help with fetching external deps, but I found a project (written in python) called conan which makes installing C++ libs easy. They even have a conan.cmake script for integration with cmake builds.

blasscoc commented 2 years ago

Using the python bazel to cmake command, I was seeing an error,

error: Missing mapping for @com_google_absl//absl/flags:marshalling in tensorstore/kvstore/gcs/BUILD

I made the following change to line 60 of com_google_absl/workspace.bzl

Seems to help.

laramiel commented 2 years ago

I can do that. cmake changes have stalled for other work, but if you make progress I can probably incorporate your changes quickly.

blasscoc commented 2 years ago

thanks, the converter does create CMakeLists.txt and the build progresses. Right now it gets through building abseil and some of the other dependencies and then falls over with riegeli library, which itself seems to be a library built using Bazel. That might be the source of the problem. Of the tensorstore third party libraries, riegeli and upd require Bazel to build.

The error in the "make" is here: In file included from tensorstore/tensorstore/internal/riegeli_json_output.cc:15: tensorstore/tensorstore/internal/riegeli_json_output.h:23:10:

fatal error: riegeli/bytes/writer.h: No such file or directory 23 | #include "riegeli/bytes/writer.h"

laramiel commented 2 years ago

Seems about right. I was going to try to apply my cmake generator to riegeli, but hadn't gotten that far yet.

sameeul commented 2 years ago

I am also working on a project that will be greatly benefited if I can get the build system in CMake working. I am a relatively new dev, but if you can point to places to work on, I have some spare time to contribute for the CMake based build system.

blasscoc commented 2 years ago

@sameeul I can share my experience. The python bazel_to_cmake.py generated the CMakeLists.txt for tensorstore. The issues I had were:

  1. re2 (the branch used by tensorstore is the abseil one) and riegeli are themselves build with Bazel so I added these codes into the tensorstore folder and used the bazel_to_cmake to build CMakeLists for them (not elegant but).
  2. I had problems with some of the dependencies, gmock, libpng, and lzma (https://tukaani.org/xz/), blosc so I just built these manually using the versions defined in third_party and installed them in /usr/local, and it links.
  3. I had some path issues with the protobuf headers that are generated by make, but they aren't discovered in the path of the files that depend on them. So I just copied them into the directory with the relevant code (not elegant)

With these fixes, excluding a minority of tests, the library builds. There's a linking error I'm following up on.

blasscoc commented 2 years ago

FWIW in third_party/com_google_googletest/workspace.bzl I changed the cmake mappings to point to gmock instead of gtest. The issue was that while linking, some of the tests couldn't find some symbols like UnorderedElementsAreMatcherImplBase which are in libgmock.a. Seems to resolve the issue.

Updated workspace.bzl reads: cmake_add_dep_mapping(target_mapping = { "@com_google_googletest//:gtest": "GTest::gmock", "@com_google_googletest//:gtest_main": "GTest::gmock_main" })

blasscoc commented 2 years ago

The driver tests were building, but they were failing at runtime due to errors related to registering the various drivers (zarr, blosc, etc). By manually editing the SRCS section in the CMakeLists.txt (for example in the zarr folder), the issue was resolved and the test passed.

Additional lines adding under sources, below zarr/driver_test.cc.

tensorstore_cc_test( NAME driver_zarr_driver_test SRCS zarr/driver_test.cc zarr/driver.cc ../internal/data_copy_concurrency_resource.cc ../internal/cache/cache_pool_resource.cc ../kvstore/mock_kvstore.cc zarr/blosc_compressor.cc n5/driver.cc COPTS ${TENSORSTORE_TEST_COPTS} DEPS GTest::gmock_main tensorstore::context tensorstore::driver_driver_testutil tensorstore::driver_zarr_driver tensorstore::driver_zarr_zarr tensorstore::driver_n5_n5 tensorstore::index_space_dim_expression tensorstore::index_space_index_transform tensorstore::internal_cache_cache tensorstore::internal_cache_cache_pool_resource tensorstore::internal_compression_blosc tensorstore::internal_data_copy_concurrency_resource tensorstore::internal_decoded_matches tensorstore::internal_global_initializer tensorstore::internal_json_binding_gtest tensorstore::internal_json_binding_json_binding tensorstore::internal_json_gtest tensorstore::internal_parse_json_matches tensorstore::kvstore_kvstore tensorstore::kvstore_memory_memory tensorstore::kvstore_mock_kvstore tensorstore::kvstore_test_util tensorstore::open tensorstore::util_assert_macros tensorstore::util_status tensorstore::util_status_testutil tensorstore::util_str_cat )

jbms commented 2 years ago

I believe the issue with the registration is that the cmake generation needs to take into account the bazel always_link attribute.

jbms commented 2 years ago

alwayslink seems like it may be tricky in CMake. I'm not sure what the best solution is --- see https://github.com/iree-org/iree/issues/190

jbms commented 2 years ago

Great progress though in getting the build to this point, thanks!

jbms commented 2 years ago

Actually, it appears CMake now has built-in support for this: https://discourse.cmake.org/t/automatically-wrapping-a-static-library-in-whole-archive-no-whole-archive-when-used-during-linking/5883

dzenanz commented 2 years ago

What is the current status? Is there any branch of code to try?

jbms commented 2 years ago

We still need to incorporate some of the suggestions made by @blasscoc to get it building and fix the always_link issue.

blasscoc commented 2 years ago

quick update;

using: cmake .. -DTENSORSTORE_ENABLE_INSTALL:BOOL=true -DCMAKE_INSTALL_PREFIX=/usr/local

Generates errors like this: CMake Error: install(EXPORT "tensorstoreTargets" ...) includes target "proto_index_transform" which requires target "absl_fixed_array" that is not in any export set.

absl, nlohmann_json (possible others involved).

Is there a way to fix this?

I've tried install abseil in the system path and using find_package(absl CONFIG REQUIRED)

like here: https://github.com/googleapis/google-cloud-cpp/blob/main/CMakeLists.txt

This does solve the EXPORT error but I run into other problems.

jbms commented 2 years ago

This is now complete.