nv-legate / legate.core

The Foundation for All Legate Libraries
https://docs.nvidia.com/legate/24.06/
Apache License 2.0
186 stars 61 forks source link

Build from source on Apple Silicon: march=native not supported #904

Closed CharlelieLrt closed 9 months ago

CharlelieLrt commented 9 months ago

I am trying to build legate locally from source on a M1 Macbook Pro (Apple Silicon), following the instructions given in the BUILD.md OS: MacOS 12.6.7 Dependencies are pulled with conda with the following environment config (generated with ./generate-conda-envs.py --python 3.10 --os osx):

name: legate-test
channels:
  - conda-forge
  - nvidia
dependencies:

  - python=3.10,!=3.9.7  # avoid https://bugs.python.org/issue45121

  # build
  - cmake>=3.24,!=3.25.0
  - cython
  - git
  - make
  - ninja
  - numba
  - openssl
  - pkg-config
  - rust
  - scikit-build>=0.13.1
  - setuptools>=60
  - zlib

  # runtime
  - cffi
  - llvm-openmp
  - numpy>=1.22
  - libblas=*=*openblas*
  - openblas=*=*openmp*
  - openblas<=0.3.21
  - opt_einsum
  - scipy
  - typing_extensions

  # tests
  - clang-tools>=8
  - clang>=8
  - colorama
  - coverage
  - mock
  - mypy>=0.961
  - pre-commit
  - pytest-cov
  - pytest-lazy-fixture
  - pytest-mock
  - pytest
  - types-docutils
  - pynvml
  - tifffile

  # docs
  - pandoc
  - doxygen
  - ipython
  - jinja2
  - markdown<3.4.0
  - pydata-sphinx-theme>=0.13
  - myst-parser
  - nbsphinx
  - sphinx-copybutton
  - sphinx>=4.4.0

Legate is built with ./install.py --max-dim 5 --openmp --hdf5 --build-tests --build-examples, which fails with the following error:

 Not searching for unused variables given on the command line.
 -- The C compiler identification is AppleClang 14.0.0.14000029
 -- The CXX compiler identification is AppleClang 14.0.0.14000029
 -- Detecting C compiler ABI info
 -- Detecting C compiler ABI info - done
 -- Check for working C compiler: /Library/Developer/CommandLineTools/usr/bin/cc - skipped
 -- Detecting C compile features
 -- Detecting C compile features - done
 -- Detecting CXX compiler ABI info
 -- Detecting CXX compiler ABI info - done
 -- Check for working CXX compiler: /Library/Developer/CommandLineTools/usr/bin/c++ - skipped
 -- Detecting CXX compile features
 -- Detecting CXX compile features - done
 -- Found Git: /Users/charlelie/anaconda3/envs/legate/bin/git (found version "2.43.0")
 -- Found Python3: /Users/charlelie/anaconda3/envs/legate/bin/python3 (found version "3.10.13") found components: Interpreter Development Development.Module Development.Embed
 -- CPM: adding package Legion@24.1.0 (24.1.0)
 -- Performing Test COMPILER_SUPPORTS_MARCH
 -- Performing Test COMPILER_SUPPORTS_MARCH - Failed
 CMake Error at _skbuild/macosx-12.0-arm64-3.10/cmake-build/_deps/legion-src/CMakeLists.txt:145 (message):
   The flag -march=native is not supported by the compiler
manopapad commented 9 months ago

@CharlelieLrt can you test if the following works on your machine? It works on mine, but mine is an x86 Mac.

~/Desktop> cat a.cc
int main() {}
~/Desktop> /Library/Developer/CommandLineTools/usr/bin/c++ -march=native a.cc  # no output expected
~/Desktop> /Library/Developer/CommandLineTools/usr/bin/c++ --version
Apple clang version 15.0.0 (clang-1500.0.40.1)
Target: x86_64-apple-darwin23.1.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

@Jacobfaib could you please also check if this problem shows up for you on https://github.com/nv-legate/legate.core on your M2 Mac?

Jacobfaib commented 9 months ago

could you please also check if this problem shows up for you on https://github.com/nv-legate/legate.core on your M2 Mac?

It doesn't, but that's because my version of clang is probably newer. @CharlelieLrt what is the output of the following for you:

$ clang --version

Apple clang does not support M1 (or M2 for that matter) -march=native until version 15 I believe.

CharlelieLrt commented 9 months ago

@manopapad

/Library/Developer/CommandLineTools/usr/bin/c++ -march=native a.cc gives me the same error:

clang: error: the clang compiler does not support '-march=native'

And here is the version output:

Apple clang version 14.0.0 (clang-1400.0.29.202)
Target: arm64-apple-darwin21.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
CharlelieLrt commented 9 months ago

Actually I just realized that I have different versions of clang. /Library/Developer/CommandLineTools/usr/bin/c++ --version gives me:

Apple clang version 14.0.0 (clang-1400.0.29.202)
Target: arm64-apple-darwin21.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

And clang --version points to the compiler in my anaconda environment:

clang version 17.0.6
Target: arm64-apple-darwin21.6.0
Thread model: posix
InstalledDir: /Users/charlelie/anaconda3/envs/legate/bin

Given the error that I got, I assume ./install.py is using the former.

manopapad commented 9 months ago

@CharlelieLrt can you try using this PR, and just leaving --march unspecified when invoking install.py? https://github.com/nv-legate/legate.core/pull/906

CharlelieLrt commented 9 months ago

This seems to have solved the march problem. However, it gives me another error during compilation:

FAILED: legate_legateio/CMakeFiles/legateio.dir/read_file.cc.o
      /Library/Developer/CommandLineTools/usr/bin/c++ -DLEGATE_USE_COLLECTIVE -DLEGATE_USE_OPENMP -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_CPP -DUSE_HDF -Dlegateio_EXPORTS -I/Users/charlelie/Documents/RESEARCH/CODES/LEGATE/legate.core/examples/io/src -I/Users/charlelie/Documents/RESEARCH/CODES/LEGATE/legate.core/src -I/Users/charlelie/Documents/RESEARCH/CODES/LEGATE/legate.core/_skbuild/macosx-12.0-arm64-3.10/cmake-build/_deps/legion-src/runtime -I/Users/charlelie/Documents/RESEARCH/CODES/LEGATE/legate.core/_skbuild/macosx-12.0-arm64-3.10/cmake-build/_deps/legion-src/runtime/mappers -I/Users/charlelie/Documents/RESEARCH/CODES/LEGATE/legate.core/_skbuild/macosx-12.0-arm64-3.10/cmake-build/_deps/legion-build/runtime -I/Users/charlelie/Documents/RESEARCH/CODES/LEGATE/legate.core/_skbuild/macosx-12.0-arm64-3.10/cmake-build/_deps/thrust-src -I/Users/charlelie/Documents/RESEARCH/CODES/LEGATE/legate.core/_skbuild/macosx-12.0-arm64-3.10/cmake-build/_deps/thrust-src/dependencies/cub -O2 -std=gnu++17 -arch arm64 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX13.1.sdk -mmacosx-version-min=12.0 -fPIC -MD -MT legate_legateio/CMakeFiles/legateio.dir/read_file.cc.o -MF legate_legateio/CMakeFiles/legateio.dir/read_file.cc.o.d -o legate_legateio/CMakeFiles/legateio.dir/read_file.cc.o -c /Users/charlelie/Documents/RESEARCH/CODES/LEGATE/legate.core/examples/io/src/read_file.cc
      /Users/charlelie/Documents/RESEARCH/CODES/LEGATE/legate.core/examples/io/src/read_file.cc:51:21: error: no matching function for call to 'min'
          int64_t my_hi = std::min((my_id + 1) * size / num_readers, size);
                          ^~~~~~~~
      /Library/Developer/CommandLineTools/SDKs/MacOSX13.1.sdk/usr/include/c++/v1/__algorithm/min.h:39:1: note: candidate template ignored: deduced conflicting types for parameter '_Tp' ('unsigned long long' vs. 'unsigned long')
      min(const _Tp& __a, const _Tp& __b)
      ^
      /Library/Developer/CommandLineTools/SDKs/MacOSX13.1.sdk/usr/include/c++/v1/__algorithm/min.h:50:1: note: candidate template ignored: could not match 'initializer_list<type-parameter-0-0>' against 'unsigned long long'
      min(initializer_list<_Tp> __t, _Compare __comp)
      ^
      /Library/Developer/CommandLineTools/SDKs/MacOSX13.1.sdk/usr/include/c++/v1/__algorithm/min.h:59:1: note: candidate function template not viable: requires single argument '__t', but 2 arguments were provided
      min(initializer_list<_Tp> __t)
      ^
      /Library/Developer/CommandLineTools/SDKs/MacOSX13.1.sdk/usr/include/c++/v1/__algorithm/min.h:30:1: note: candidate function template not viable: requires 3 arguments, but 2 were provided
      min(const _Tp& __a, const _Tp& __b, _Compare __comp)
manopapad commented 9 months ago

Can you try replacing the failing line with

int64_t my_hi = std::min((my_id + 1) * size / num_readers, static_cast<int64_t>(size));

or maybe

int64_t my_hi = std::min(static_cast<int64_t>((my_id + 1) * size / num_readers), static_cast<int64_t>(size));
CharlelieLrt commented 9 months ago

I tried both, and the second one seems to have fixed the problem.

CharlelieLrt commented 9 months ago

I built legate successfully with the previous options. I am now trying to build cunumeric simply by running the ./install.py of cunumeric, but this gives me an error when building tblis:

  checking whether the C compiler works... no
  configure: error: in `/Users/charlelie/Documents/RESEARCH/CODES/LEGATE/cunumeric/_skbuild/macosx-12.0-arm64-3.10/cmake-build/_deps/tblis-src':
  configure: error: C compiler cannot create executables

Looking at the config.log, I see:

configure:3944: /Library/Developer/CommandLineTools/usr/bin/cc -V >&5 
clang: error: argument to '-V' is missing (expected 1 value)
clang: error: no input files
configure:3955: $? = 1 
configure:3944: /Library/Developer/CommandLineTools/usr/bin/cc -qversion >&5 
clang: error: unknown argument '-qversion'; did you mean '--version'?
clang: error: no input files
configure:3955: $? = 1 
configure:3944: /Library/Developer/CommandLineTools/usr/bin/cc -version >&5 
clang: error: unknown argument '-version'; did you mean '--version'?
clang: error: no input files
configure:3955: $? = 1 
configure:3975: checking whether the C compiler works
configure:3997: /Library/Developer/CommandLineTools/usr/bin/cc -O3   conftest.c  >&5 
ld: library not found for -lSystem
clang: error: linker command failed with exit code 1 (use -v to see invocation)
configure:4001: $? = 1 
configure:4041: result: no
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "tblis"
| #define PACKAGE_TARNAME "tblis"
| #define PACKAGE_VERSION "1.2.0"
| #define PACKAGE_STRING "tblis 1.2.0"
| #define PACKAGE_BUGREPORT "damatthews@smu.edu"
| #define PACKAGE_URL "http://www.github.com/devinamatthews/tblis"
| #define PACKAGE "tblis"
| #define VERSION "1.2.0"
| /* end confdefs.h.  */  
| 
| int 
| main (void)
| {
| 
|   ;   
|   return 0;
| }
configure:4046: error: in `/Users/charlelie/Documents/RESEARCH/CODES/LEGATE/cunumeric/_skbuild/macosx-12.0-arm64-3.10/cmake-build/_deps/tblis-src':
configure:4048: error: C compiler cannot create executables
See `config.log' for more details

I am wondering if it's simply because cmake is trying to get the version of the C compiler with -version, but it should be --version instead?

Another (unrelated) problem: I built legate.core with max-dim=5 and hdf support. The cunumeric instructions to build from source suggest that

_Once Legate Core is installed, you can simply invoke ./install.py from the cuNumeric top-level directory. The build will automatically pick up the configuration used when building Legate Core_.

But when I run the ./install.py of cunumeric, I see hdf: False and maxdim: 4. So, it seems cunumeric is not picking up the options used to build legate.core. Should I pass again all the options given when running the install.py of legate.core to the one of cunumeric?

manopapad commented 9 months ago

I am wondering if it's simply because cmake is trying to get the version of the C compiler with -version, but it should be --version instead?

I think it's actually the last message that's the real problem ("configure" tries a lot of things, and not all failures are fatal, e.g. the failures having to do with "version" are probably "configure" just trying multiple things to figure out which "version" flag your compiler accepts).

If you try to run /Library/Developer/CommandLineTools/usr/bin/cc -O3 conftest.c manually (where conftest.c contains what's on configure's output), does that also fail with a linker error (mentioning -lSystem)? If so, possibly /Library/Developer/CommandLineTools/usr/bin/cc is not supposed to be used directly, and instead we should pass CC=/usr/bin/cc and CXX=/usr/bin/c++, or gcc/g++ respectively.

So, it seems cunumeric is not picking up the options used to build legate.core

This is just the UI being confusing. cuNumeric's install.py is printing out what got passed to it (in this case you didn't pass anything, so you're just seeing the defaults). But these values are actually ignored, since legate.core has already made its choices.

This is a holdover from an attempt we had made to have cuNumeric build legate.core as a subproject (so you'd only have to do one build, instead of needing to build legate.core separately first), in which case you'd need a way to communicate these settings to the legate.core build. But this never panned out, so I'll just start a PR to remove these.

HDF is also not really used in legate.core at this point, so we may just remove that too (doesn't hurt that you added it, but also is not necessary).

CharlelieLrt commented 9 months ago

@manopapad thank you for the explanations, that's clear now. As a workaround I tried to build tblis separately, and then passing it to --with-tblis of cunumeric's install. I got the same error when trying to build the master branch of tblis, but according to this issue, only the develop branch of tblis supports Apple Silicon. I was able to build it with ./configure --prefix=$(pwd)/install --with-label-type=int32_t --with-length-type=int64_t --with-stride-type=int64_t.

Then, I tried to build cunumeric with ./install.py --with-tblis ../tblis/install (I had to make some changes similar to the PR #906 to the install.py of cunumeric, otherwise I would get again the error related to march=native not supported). However, this gives me another error still related to tblis:

  FAILED: /Users/charlelie/Documents/RESEARCH/CODES/LEGATE/cunumeric/build/CMakeFiles/cunumeric.dir/src/cunumeric/matrix/contract.cc.o
  /Library/Developer/CommandLineTools/usr/bin/c++ -DLEGATE_USE_COLLECTIVE -DLEGATE_USE_OPENMP -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_CPP -DUSE_HDF -Dcunumeric_EXPORTS -I/Users/charlelie/Documents/RESEARCH/CODES/LEGATE/cunumeric/src -I/Users/charlelie/anaconda3/envs/legate/include/rapids -isystem /Users/charlelie/anaconda3/envs/legate/include/legate -isystem /Users/charlelie/anaconda3/envs/legate/include -isystem /Users/charlelie/anaconda3/envs/legate/include/mappers -isystem /Users/charlelie/Documents/RESEARCH/CODES/LEGATE/tblis/install/include -O2 -std=gnu++17 -arch arm64 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX13.1.sdk -mmacosx-version-min=12.0 -fPIC -mcpu=native -Wno-deprecated-declarations -UTHRUST_DEVICE_SYSTEM -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_OMP -Xclang -fopenmp -MD -MT /Users/charlelie/Documents/RESEARCH/CODES/LEGATE/cunumeric/build/CMakeFiles/cunumeric.dir/src/cunumeric/matrix/contract.cc.o -MF /Users/charlelie/Documents/RESEARCH/CODES/LEGATE/cunumeric/build/CMakeFiles/cunumeric.dir/src/cunumeric/matrix/contract.cc.o.d -o /Users/charlelie/Documents/RESEARCH/CODES/LEGATE/cunumeric/build/CMakeFiles/cunumeric.dir/src/cunumeric/matrix/contract.cc.o -c /Users/charlelie/Documents/RESEARCH/CODES/LEGATE/cunumeric/src/cunumeric/matrix/contract.cc
  In file included from /Users/charlelie/Documents/RESEARCH/CODES/LEGATE/cunumeric/src/cunumeric/matrix/contract.cc:21:
  In file included from /Users/charlelie/Documents/RESEARCH/CODES/LEGATE/tblis/install/include/tblis/tblis.h:4:
  In file included from /Users/charlelie/Documents/RESEARCH/CODES/LEGATE/tblis/install/include/tblis/util/configs.h:4:
  In file included from /Users/charlelie/Documents/RESEARCH/CODES/LEGATE/tblis/install/include/tblis/util/basic_types.h:99:
  In file included from /Users/charlelie/Documents/RESEARCH/CODES/LEGATE/tblis/install/include/tblis/util/../external/marray/marray/marray.hpp:4:
  /Users/charlelie/Documents/RESEARCH/CODES/LEGATE/tblis/install/include/tblis/util/../external/marray/marray/marray_view.hpp:4:10: fatal error: 'detail/utility.hpp' file not found
  #include "detail/utility.hpp"
           ^~~~~~~~~~~~~~~~~~~~
  1 error generated.

Indeed, the header detail/utility.hpp is in ../tblis/src/..., but I have to pass the install directory of tblis to cunumeric (--with-tblis ../tblis/install). Maybe some include flags missing?

manopapad commented 9 months ago

This looks like an issue in the tblis develop branch (a "detail" header is not getting copied to the installation directory). The master branch doesn't seem to have this header.

You could try copying the missing headers from tblis/src to tblis/install/include, and if that works we can ask the tblis maintainers to fix this properly.

Alternatively, I believe @ipdemes and @magnatelee have been able to build tblis on ARM (using the "master" branch) using this patch: arm.patch

CharlelieLrt commented 9 months ago

Ok, so I had to manually copy all the directories missing (detail, fwd, dpd, indexed, indexed_dpd) from tblis/src/external/marray/marray/ to the install directory. But in the end that worked and the compilation of cunumeric completed without any other problem.

Thanks for your help!

manopapad commented 9 months ago

Thanks for powering through this! I opened an issue on the tblis bug tracker, and will push some fixes for the other issues you discovered above. Closing this issue for now.