Open f0uriest opened 4 years ago
We don't support the PPC architecture ourselves and most likely don't have the engineer bandwidth to maintain such a build.
But we wouldn't object if the community wanted to supported ppc64le. There are likely two pieces:
Contributions welcome!
As to your specific question, I'd make sure that MKLDNN is disabled in the build (I believe there is an option to build.py for this.) I doubt MKLDNN works on non-Intel architectures.
I am working on the same issue. I managed to reach this point
https://gist.github.com/feifzhou/152d5c6e15e3485befa78e69cd340c32#file-gistfile1-txt
But got
I tried both gcc 7.3.1 and 8.3.1 with same inclusion errors. Gcc 4.9.3 got me lots of syntax errors. MKLDNN was disabled.
As to your specific question, I'd make sure that MKLDNN is disabled in the build (I believe there is an option to build.py for this.) I doubt MKLDNN works on non-Intel architectures.
@feifzhou I don't know how to solve your issue, but it looks to me like Bazel isn't understanding something about the location of the standard library headers on your system. Do you have the same problem if you try to build TensorFlow? We share a lot of build infrastructure with them, so I'm wondering if this is JAX specific or a more general Bazel/TF problem.
(Ultimately we don't have cycles to work on this, but we welcome contributions!)
@f0uriest I've built v0.1.55 successfully on an IBM power 9 but more recent version fail in the same way
@mrorro Yeah I haven't been able to build any version since 0.1.55 either. It looks like at some point they switched some of the compiler flags to ones that are only defined for x86-64 architectures. Bazel supposedly lets you override these but I haven't gotten it to work yet.
@f0uriest If you can share the output of the build, we might be able to suggest things to change.
I'd speculate there are two or three things you'd need to do :
a) update build.py
to pass the correct flags, if it isn't already doing so.
b) make sure XLA links in the Power LLVM backend if targeting Power. There are already cases for x86 and ARM; I don't recall if Power is included.
c) add a Power case to build_wheel.py
.
@f0uriest @mrorro
Did you all happen to make progress with this issue? We are looking to build JAX on Summit and I happened upon this issue/discussion.
Nothing so far. Would love to see if you can solve it on Summit.
On Thu, Jul 22, 2021 at 5:38 PM proutrc @.***> wrote:
@f0uriest https://github.com/f0uriest @mrorro https://github.com/mrorro
Did you all happen to make progress with this issue? We are looking to build JAX on Summit and I happened upon this issue/discussion.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/google/jax/issues/4493#issuecomment-885327371, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADKT6A6WTKF7XN3XTRB2EA3TZC2ZJANCNFSM4SIJHG4A .
I also haven't made any progress but haven't had much time to work on it either. We've been using 0.1.55 for a while, though I'm hoping to upgrade later this summer
@hawkinsp on Summit, after fixing compiler flags, we are also getting the download error:
WARNING: Download from http://mirror.tensorflow.org/github.com/tensorflow/runtime/archive/d29d1ef0a65a8f9c23e1f88067ce4205d3085e87.tar.gz failed: class com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException GET returned 404 Not Found
@asedova That is a benign warning, you can ignore it.
I was able to cross-compile a ppc64le wheel on an x86-64 machine by following the instructions in #7365. I can't easily test the resulting wheel, though.
I would imagine that building natively on a ppc64le machine requires nothing other than following the standard instructions once the changes in #7365 are merged.
@asedova That is a benign warning, you can ignore it.
Thanks
I was able to cross-compile a ppc64le wheel on an x86-64 machine by following the instructions in #7365. I can't easily test the resulting wheel, though.
I would imagine that building natively on a ppc64le machine requires nothing other than following the standard instructions once the changes in #7365 are merged.
Thanks @hawkinsp we are eagerly awaiting this merge
One thing I'd like to double check: what does:
import platform
print(platform.machine())
print on your PPC machine?
And is it little endian?
On my system I get
>>> import platform
>>> print(platform.machine())
ppc64le
It is little-endian
yes we are LE also
@f0uriest you guys are on Sierra?
On Lassen
On Fri, Jul 23, 2021 at 8:56 AM asedova @.***> wrote:
@f0uriest you guys are on Sierra?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
@asedova Traverse at PPPL
I could also share the cross-compiled wheel I made for Python 3.9; but I have no idea if it actually works. So it's probably best if you make sure it builds for you.
@hawkinsp Here is what I see on initial attempt: summit_jaxlib.log
Just for record, I have tried various versions of GCC (6.4.0, 7.4.0, 8.1.1)
Notable errors:
gcc: error: unrecognized command line option '-std=c++14'
ERROR: /tmp/_bazel_rprout/b2ebe10a0ad0f6175e81a930563cb9d3/external/com_google_protobuf/BUILD:155:11: Compiling src/google/protobuf/util/internal/datapiece.cc [for host] failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command (cd /tmp/_baz
This second error will point at different source files for different runs it seems.
@asedova do you see anything different?
You need a C++14 compiler to build JAX.
Something seems surprising here, though. gcc 6.1 and newer apparently support C++14: https://gcc.gnu.org/projects/cxx-status.html#cxx14
Note the documentation is quite clear that -std=c++14
is a flag gcc accepts! So this seems like something you need to figure out about your gcc installation.
@hawkinsp Apologies, I could have goofed that one actually. I thought had GCC loaded...
Here is a run with GCC 7.4.0:summit_jaxlib-gcc7.4.0.log
@proutrc The issue here is that bazel hermeticity checking is upset that you appear to be reading header files outside what it considers to be the standard system paths.
I think your best fix here might be to write a small custom Bazel toolchain. As it happens, I show an example of how to do that in a comment in #7365. It's not that bad, you should be able to just modify my example. You would need to modify cxx_builtin_include_directories
to include that header directory, and you'd need to change the other tool paths to point to the right places on your system.
In your case, you'd want to set host_crosstool_top
to the same toolchain as crosstool_top
.
@hawkinsp sorry for my ignorance.. but, is the mentioned toolchain
directory from the top of the jax repp or in the build directory? My familiarity with bazel and its setup is limited, unfortunately. I am happy to work on this though, just want to make sure I am setup properly.
@proutrc In the example I gave, it's at the root of the JAX repository. (It doesn't matter a whole lot, so long as all the paths agree, and in my command line, etc. I refer to it as //toolchain
, which is at the root of the repository.)
@hawkinsp
I seem to still run into similar issues. Is there anything else I am missing, besides an update to those paths for the tools and the cxx_builtin_include_directories
? I am also putting the realpath in the cxx_builtin_include_directories
list, but I see it has the non-realpath in the error output. Sometimes it does have the realpath though, oddly. I appreciate your help.
def _impl(ctx):
return cc_common.create_cc_toolchain_config_info(
ctx = ctx,
features = features, # NEW
cxx_builtin_include_directories = [
"/autofs/nccs-svm1_sw/summit/gcc/7.4.0/lib/gcc/powerpc64le-none-linux-gnu/7.4.0/include/",
"/autofs/nccs-svm1_sw/summit/gcc/7.4.0/include/",
"/autofs/nccs-svm1_sw/summit/gcc/7.4.0/lib/gcc/powerpc64le-none-linux-gnu/7.4.0/include-fixed/",
"/usr/include/",
],
Error (it does seem to get further sometimes):
[0 / 31] [Prepa] Creating source manifest for //build:build_wheel ... (5 actions, 0 running)
[68 / 529] Compiling src/google/protobuf/generated_enum_util.cc [for host]; 2s local ... (128 actions running)
[76 / 529] Compiling src/google/protobuf/generated_enum_util.cc [for host]; 5s local ... (128 actions running)
[83 / 529] Compiling src/google/protobuf/extension_set.cc [for host]; 9s local ... (128 actions running)
[89 / 529] Compiling src/google/protobuf/extension_set.cc [for host]; 13s local ... (128 actions running)
[98 / 529] Compiling src/google/protobuf/extension_set.cc [for host]; 18s local ... (128 actions, 127 running)
ERROR: /gpfs/alpine/stf007/scratch/rprout/jax/jaxlib/BUILD:352:17: Compiling jaxlib/cpu_feature_guard.c failed: undeclared inclusion(s) in rule '//jaxlib:cpu_feature_guard.so':
this rule is missing dependency declarations for the following files included by 'jaxlib/cpu_feature_guard.c':
'/sw/summit/gcc/7.4.0/lib/gcc/powerpc64le-none-linux-gnu/7.4.0/include-fixed/limits.h'
'/sw/summit/gcc/7.4.0/lib/gcc/powerpc64le-none-linux-gnu/7.4.0/include-fixed/syslimits.h'
'/sw/summit/gcc/7.4.0/lib/gcc/powerpc64le-none-linux-gnu/7.4.0/include/stddef.h'
'/sw/summit/gcc/7.4.0/lib/gcc/powerpc64le-none-linux-gnu/7.4.0/include/stdarg.h'
'/sw/summit/gcc/7.4.0/lib/gcc/powerpc64le-none-linux-gnu/7.4.0/include/stdint.h'
Target //build:build_wheel failed to build
INFO: Elapsed time: 40.420s, Critical Path: 22.55s
INFO: 333 processes: 241 internal, 92 local.
FAILED: Build did NOT complete successfully
ERROR: Build failed. Not running target
FAILED: Build did NOT complete successfully
b''
Traceback (most recent call last):
File "build/build.py", line 604, in <module>
main()
File "build/build.py", line 599, in main
shell(command)
File "build/build.py", line 52, in shell
output = subprocess.check_output(cmd)
File "/sw/summit/python/3.7/anaconda3/5.3.0/lib/python3.7/subprocess.py", line 376, in check_output
**kwargs).stdout
File "/sw/summit/python/3.7/anaconda3/5.3.0/lib/python3.7/subprocess.py", line 468, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/sw/.testing/belhorn/summit/bin/bazel', 'run', '--verbose_failures=true', '--host_crosstool_top=//toolchain:ppc', '--crosstool_top=//toolchain:ppc', '--config=short_logs', '--config=cuda', '--define=xla_python_enable_gpu=true', ':build_wheel', '--', '--output_path=/gpfs/alpine/stf007/scratch/rprout/jax/dist', '--cpu=ppc64le']' returned non-zero exit status 1.
@proutc Try with --bazel_options=--cpu=ppc
.
@hawkinsp
INFO: Found 1 target...
[0 / 68] [Prepa] Writing file jaxlib/lapack.so-2.params
ERROR: /tmp/_bazel_rprout/b2ebe10a0ad0f6175e81a930563cb9d3/external/com_google_absl/absl/base/BUILD.bazel:596:11: Compiling absl/base/internal/exponential_biased.cc failed: undeclared inclusion(s) in rule '@com_google_absl//absl/base:exponential_biased':
this rule is missing dependency declarations for the following files included by 'absl/base/internal/exponential_biased.cc':
'/sw/summit/gcc/7.4.0/lib/gcc/powerpc64le-none-linux-gnu/7.4.0/include/stdint.h'
'/sw/summit/gcc/7.4.0/lib/gcc/powerpc64le-none-linux-gnu/7.4.0/include-fixed/limits.h'
'/sw/summit/gcc/7.4.0/lib/gcc/powerpc64le-none-linux-gnu/7.4.0/include-fixed/syslimits.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/cstddef'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/powerpc64le-none-linux-gnu/bits/c++config.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/powerpc64le-none-linux-gnu/bits/os_defines.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/powerpc64le-none-linux-gnu/bits/cpu_defines.h'
'/sw/summit/gcc/7.4.0/lib/gcc/powerpc64le-none-linux-gnu/7.4.0/include/stddef.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/ciso646'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/cassert'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/algorithm'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/utility'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/stl_relops.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/stl_pair.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/move.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/concept_check.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/type_traits'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/initializer_list'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/stl_algobase.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/functexcept.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/exception_defines.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/cpp_type_traits.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/ext/type_traits.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/ext/numeric_traits.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/stl_iterator_base_types.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/stl_iterator_base_funcs.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/debug/assertions.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/stl_iterator.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/ptr_traits.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/debug/debug.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/predefined_ops.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/stl_algo.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/cstdlib'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/std_abs.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/algorithmfwd.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/stl_heap.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/stl_tempbuf.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/stl_construct.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/new'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/exception'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/exception.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/exception_ptr.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/cxxabi_init_exception.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/typeinfo'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/hash_bytes.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/nested_exception.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/ext/alloc_traits.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/alloc_traits.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/memoryfwd.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/uniform_int_dist.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/limits'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/atomic'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/atomic_base.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/atomic_lockfree_defines.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/cmath'
Target //build:build_wheel failed to build
INFO: Elapsed time: 19.438s, Critical Path: 1.74s
INFO: 245 processes: 215 internal, 30 local.
FAILED: Build did NOT complete successfully
ERROR: Build failed. Not running target
FAILED: Build did NOT complete successfully
b''
Traceback (most recent call last):
File "build/build.py", line 604, in <module>
main()
File "build/build.py", line 599, in main
shell(command)
File "build/build.py", line 52, in shell
output = subprocess.check_output(cmd)
File "/sw/summit/python/3.7/anaconda3/5.3.0/lib/python3.7/subprocess.py", line 376, in check_output
**kwargs).stdout
File "/sw/summit/python/3.7/anaconda3/5.3.0/lib/python3.7/subprocess.py", line 468, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/sw/.testing/belhorn/summit/bin/bazel', 'run', '--verbose_failures=true', '--host_crosstool_top=//toolchain:ppc', '--crosstool_top=//toolchain:ppc', '--cpu=ppc', '--config=short_logs', '--config=cuda', '--define=xla_python_enable_gpu=true', ':build_wheel', '--', '--output_path=/gpfs/alpine/stf007/scratch/rprout/jax/dist', '--cpu=ppc64le']' returned non-zero exit status 1.
Try putting the /sw/...
paths in the cxx_builtin_include_directories
. Or put try putting both.
Unfortunately, I have tried all as well:
Error:
[0 / 35] [Prepa] Creating source manifest for //build:build_wheel
[64 / 542] Compiling src/google/protobuf/any_lite.cc [for host]; 3s local ... (127 actions, 126 running)
[80 / 542] Compiling src/google/protobuf/extension_set.cc [for host]; 6s local ... (127 actions running)
[89 / 542] Compiling src/google/protobuf/extension_set.cc [for host]; 9s local ... (128 actions running)
[100 / 544] Compiling src/google/protobuf/extension_set.cc [for host]; 13s local ... (127 actions running)
[157 / 614] Compiling src/google/protobuf/extension_set.cc [for host]; 17s local ... (128 actions running)
[197 / 716] Compiling src/google/protobuf/extension_set.cc [for host]; 22s local ... (128 actions running)
[239 / 716] Compiling src/google/protobuf/compiler/cpp/cpp_helpers.cc [for host]; 27s local ... (128 actions running)
ERROR: /tmp/_bazel_rprout/b2ebe10a0ad0f6175e81a930563cb9d3/external/com_google_absl/absl/time/internal/cctz/BUILD.bazel:53:11: Compiling absl/time/internal/cctz/src/time_zone_posix.cc failed: undeclared inclusion(s) in rule '@com_google_absl//absl/time/internal/cctz:time_zone':
this rule is missing dependency declarations for the following files included by 'absl/time/internal/cctz/src/time_zone_posix.cc':
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/cstdint'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/powerpc64le-none-linux-gnu/bits/c++config.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/powerpc64le-none-linux-gnu/bits/os_defines.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/powerpc64le-none-linux-gnu/bits/cpu_defines.h'
'/sw/summit/gcc/7.4.0/lib/gcc/powerpc64le-none-linux-gnu/7.4.0/include/stdint.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/string'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/stringfwd.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/memoryfwd.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/char_traits.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/stl_algobase.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/functexcept.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/exception_defines.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/cpp_type_traits.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/ext/type_traits.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/ext/numeric_traits.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/stl_pair.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/move.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/concept_check.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/type_traits'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/stl_iterator_base_types.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/stl_iterator_base_funcs.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/debug/assertions.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/stl_iterator.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/ptr_traits.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/debug/debug.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/predefined_ops.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/postypes.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/cwchar'
'/sw/summit/gcc/7.4.0/lib/gcc/powerpc64le-none-linux-gnu/7.4.0/include/stdarg.h'
'/sw/summit/gcc/7.4.0/lib/gcc/powerpc64le-none-linux-gnu/7.4.0/include/stddef.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/allocator.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/powerpc64le-none-linux-gnu/bits/c++allocator.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/ext/new_allocator.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/new'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/exception'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/exception.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/exception_ptr.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/cxxabi_init_exception.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/typeinfo'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/hash_bytes.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/nested_exception.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/localefwd.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/powerpc64le-none-linux-gnu/bits/c++locale.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/clocale'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/iosfwd'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/cctype'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/ostream_insert.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/cxxabi_forced.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/stl_function.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/backward/binders.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/range_access.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/initializer_list'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/basic_string.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/ext/atomicity.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/powerpc64le-none-linux-gnu/bits/gthr.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/powerpc64le-none-linux-gnu/bits/gthr-default.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/powerpc64le-none-linux-gnu/bits/atomic_word.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/ext/alloc_traits.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/alloc_traits.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/ext/string_conversions.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/cstdlib'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/std_abs.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/cstdio'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/cerrno'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/functional_hash.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/bits/basic_string.tcc'
'/sw/summit/gcc/7.4.0/lib/gcc/powerpc64le-none-linux-gnu/7.4.0/include-fixed/limits.h'
'/sw/summit/gcc/7.4.0/lib/gcc/powerpc64le-none-linux-gnu/7.4.0/include-fixed/syslimits.h'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/cstddef'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/ciso646'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/cstring'
'/sw/summit/gcc/7.4.0/include/c++/7.4.0/limits'
Target //build:build_wheel failed to build
INFO: Elapsed time: 48.498s, Critical Path: 33.11s
INFO: 376 processes: 221 internal, 155 local.
FAILED: Build did NOT complete successfully
ERROR: Build failed. Not running target
FAILED: Build did NOT complete successfully
b''
Traceback (most recent call last):
File "build/build.py", line 604, in <module>
main()
File "build/build.py", line 599, in main
shell(command)
File "build/build.py", line 52, in shell
output = subprocess.check_output(cmd)
File "/sw/summit/python/3.7/anaconda3/5.3.0/lib/python3.7/subprocess.py", line 376, in check_output
**kwargs).stdout
File "/sw/summit/python/3.7/anaconda3/5.3.0/lib/python3.7/subprocess.py", line 468, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/sw/.testing/belhorn/summit/bin/bazel', 'run', '--verbose_failures=true', '--host_crosstool_top=//toolchain:ppc', '--crosstool_top=//toolchain:ppc', '--cpu=ppc', '--config=short_logs', '--config=cuda', '--define=xla_python_enable_gpu=true', ':build_wheel', '--', '--output_path=/gpfs/alpine/stf007/scratch/rprout/jax/dist', '--cpu=ppc64le']' returned non-zero exit status 1.
cxx_builtin_include_directories
list:
def _impl(ctx):
return cc_common.create_cc_toolchain_config_info(
ctx = ctx,
features = features, # NEW
cxx_builtin_include_directories = [
"/autofs/nccs-svm1_sw/summit/gcc/7.4.0/lib/gcc/powerpc64le-none-linux-gnu/7.4.0/include",
"/sw/summit/gcc/7.4.0/lib/gcc/powerpc64le-none-linux-gnu/7.4.0/include",
"/autofs/nccs-svm1_sw/summit/gcc/7.4.0/include",
"/sw/summit/gcc/7.4.0/include",
"/autofs/nccs-svm1_sw/summit/gcc/7.4.0/lib/gcc/powerpc64le-none-linux-gnu/7.4.0/include-fixed",
"/sw/summit/gcc/7.4.0/lib/gcc/powerpc64le-none-linux-gnu/7.4.0/include-fixed",
"/usr/include",
],
@hawkinsp perhaps relative to here?
@f0uriest you said you got a previous version to build?
Not me..
On Fri, Jul 23, 2021 at 2:36 PM asedova @.***> wrote:
@f0uriest https://github.com/f0uriest you said you got a previous version to build?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/google/jax/issues/4493#issuecomment-885926196, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADKT6AZ6N6R4VLYQPUC7EQTTZHOG3ANCNFSM4SIJHG4A .
@hawkinsp it looks like I was able to get around the undeclared inclusion(s)
by adding this to a CROSSTOOL file, in the toolchain/
directory. This is in addition to the cc_toolchain_config.bzl
and BUILD
file we altered from your example.
[rprout@login1.summit jax]$ ls toolchain/
BUILD CROSSTOOL cc_toolchain_config.bzl
[rprout@login1.summit jax]$ cat toolchain/CROSSTOOL
compiler_flag: "-isystem"
compiler_flag: "/sw/summit/gcc/7.4.0/lib/gcc/powerpc64le-none-linux-gnu/7.4.0/include-fixed"
compiler_flag: "-isystem"
compiler_flag: "/sw/summit/gcc/7.4.0/include/c++/7.4.0"
compiler_flag: "-isystem"
compiler_flag: "/sw/summit/gcc/7.4.0/lib/gcc/powerpc64le-none-linux-gnu/7.4.0/include"
Here we are now: summit-jaxlib.log
It actually looks like this CROSSTOOL file addition is not they key. The real key seems to be throttling Bazel wih this addition to my ~/.bazelrc
file:
build --jobs 2 --local_ram_resources=HOST_RAM*0.04
test --jobs 2
It looks like you want to throttle Bazel if you use NFS at all. Summit's provided software tree (/sw) is on NFS. I had forgotten I added the above throttling in all this. As soon as I removed it though, from my ~/.bazelrc
file, the undeclared inclusion(s)
came back.
@feifzhou You should try the throttling method above, by adding that to your ~/.bazelrc
. Perhaps you will then also get passed the undeclared inclusion(s)
. Maybe you have something on NFS?
In the end, we now seem to be in a similar boat as @f0uriest. Our log now similarly points at Eigen. I will try a different GCC next (maybe some additional flags?).
@hawkinsp it looks like I was able to get around the
undeclared inclusion(s)
by adding this to a CROSSTOOL file, in thetoolchain/
directory. This is in addition to thecc_toolchain_config.bzl
andBUILD
file we altered from your example.[rprout@login1.summit jax]$ ls toolchain/ BUILD CROSSTOOL cc_toolchain_config.bzl
[rprout@login1.summit jax]$ cat toolchain/CROSSTOOL compiler_flag: "-isystem" compiler_flag: "/sw/summit/gcc/7.4.0/lib/gcc/powerpc64le-none-linux-gnu/7.4.0/include-fixed" compiler_flag: "-isystem" compiler_flag: "/sw/summit/gcc/7.4.0/include/c++/7.4.0" compiler_flag: "-isystem" compiler_flag: "/sw/summit/gcc/7.4.0/lib/gcc/powerpc64le-none-linux-gnu/7.4.0/include"
Here we are now: summit-jaxlib.log
@proutrc I'm wondering if that's related to building with gcc 7. See this similar looking issue for PyTorch: https://github.com/pytorch/pytorch/pull/50640 Try 8.1, since you have it?
You might be able to work around the problem by sending a pull request to Eigen that adds similar fallback definitions of the missing vector intrinsics.
@hawkinsp I did indeed go that route, using 8.1.1 over the weekend and this morning.
Oddly, the undeclared inclusion(s)
comes back! Seemingly further though:
https://gist.github.com/proutrc/d4bc637555d3624d8aa4ccf6a65f348f
@proutrc Can you share the toolchain .bzl and BUILD files you are using in another gist?
@hawkinsp BUILD: https://gist.github.com/proutrc/ba20b7f5d2c4b6ae7e26cfe4afdea44e cc_toolchain: https://gist.github.com/proutrc/4cb2bd0a3a4f804243e8134295079906
I'd include both the /autofs
and /sw
paths just to make sure it doesn't help. (Including more paths should only help, not hurt.)
Beyond that I might try adding the compiler flags in the Bazel issue you link above.
Another suggestion is you might try clearing any bazel caches (https://stackoverflow.com/questions/43921911/how-to-resolve-bazel-undeclared-inclusions-error/48513577#48513577). Deleting ~/.cache/bazel
would probably work.
Thanks @hawkinsp, I will play with this more. I have added both /sw and /autofs paths before, but am going to try that again and will report back.
I have been setting these for cache, etc.. then clearing them every run (off NFS):
startup --output_user_root=/gpfs/alpine/stf007/scratch/rprout/bazel-build-cache/user-root
build --disk_cache=/gpfs/alpine/stf007/scratch/rprout/bazel-cache/
export TEST_TMPDIR=/gpfs/alpine/stf007/scratch/rprout/bazel-tmp/
In addition to running /gpfs/alpine/stf007/scratch/rprout/bazel-4.1.0/output/bazel clean --expunge
Maybe I haven't found the right combo of things yet, not sure. But, Bazel definitely seems finicky about NFS.
Is it possible to use non-NFS temporary and cache directories? I don't know if it will help, but it might.
Is it possible to use non-NFS temporary and cache directories? I don't know if it will help, but it might.
I think that is what I am doing with these settings:
startup --output_user_root=/gpfs/alpine/stf007/scratch/rprout/bazel-build-cache/user-root
build --disk_cache=/gpfs/alpine/stf007/scratch/rprout/bazel-cache/
export TEST_TMPDIR=/gpfs/alpine/stf007/scratch/rprout/bazel-tmp/
@hawkinsp I can confirm I don't cache anything on NFS.
I also added the /sw and /autofs paths:
def _impl(ctx):
return cc_common.create_cc_toolchain_config_info(
ctx = ctx,
features = features, # NEW
cxx_builtin_include_directories = [
#"/autofs/nccs-svm1_sw/summit/gcc/7.4.0/include",
#"/autofs/nccs-svm1_sw/summit/gcc/7.4.0/lib/gcc/powerpc64le-none-linux-gnu/7.4.0/include",
#"/autofs/nccs-svm1_sw/summit/gcc/7.4.0/lib/gcc/powerpc64le-none-linux-gnu/7.4.0/include-fixed",
"/sw/summit/gcc/8.1.1/include",
"/autofs/nccs-svm1_sw/summit/gcc/8.1.1/include",
"/sw/summit/gcc/8.1.1/include/c++/8.1.1",
"/autofs/nccs-svm1_sw/summit/gcc/8.1.1/lib/gcc/powerpc64le-unknown-linux-gnu/8.1.1/include",
"/autofs/nccs-svm1_sw/summit/gcc/8.1.1/lib/gcc/powerpc64le-unknown-linux-gnu/8.1.1/include-fixed",
"/sw/summit/gcc/8.1.1/lib/gcc/powerpc64le-unknown-linux-gnu/8.1.1/include",
"/sw/summit/gcc/8.1.1/lib/gcc/powerpc64le-unknown-linux-gnu/8.1.1/include-fixed/",
"/autofs/nccs-svm1_sw/summit/gcc/8.1.1/include/c++/8.1.1",
"/usr/include",
]
It is strange too, because it is obviously able to compile things leading up to the inclusion
failure:
......
INFO: Found 1 target...
[0 / 4] [Prepa] BazelWorkspaceStatusAction stable-status.txt
[79 / 535] Compiling src/google/protobuf/struct.pb.cc [for host]; 0s local, remote-cache ... (3 actions, 2 running)
[124 / 535] Compiling src/google/protobuf/compiler/java/java_message_field.cc [for host]; 0s local, remote-cache ... (3 actions, 2 running)
[178 / 535] Compiling src/google/protobuf/compiler/csharp/csharp_field_base.cc [for host]; 0s local, remote-cache ... (3 actions, 2 running)
[1,702 / 2,103] Compiling internal/wait.c [for host]; 0s local, remote-cache ... (3 actions, 2 running)
[1,814 / 2,174] Compiling external/org_tensorflow/tensorflow/core/framework/cost_graph.pb.cc [for host]; 1s local, remote-cache ... (3 actions, 2 running)
ERROR: /gpfs/alpine/stf007/scratch/rprout/bazel-build-cache/user-root/b2ebe10a0ad0f6175e81a930563cb9d3/external/com_github_grpc_grpc/BUILD:1883:16: Compiling src/core/ext/transport/chttp2/server/insecure/server_chttp2.cc failed: undeclared inclusion(s) in rule '@com_github_grpc_grpc//:grpc_transport_chttp2_server_insecure':
this rule is missing dependency declarations for the following files included by 'src/core/ext/transport/chttp2/server/insecure/server_chttp2.cc':
'/sw/summit/gcc/8.1.1/lib/gcc/powerpc64le-unknown-linux-gnu/8.1.1/include/stdint.h'
'/sw/summit/gcc/8.1.1/lib/gcc/powerpc64le-unknown-linux-gnu/8.1.1/include/stddef.h'
'/sw/summit/gcc/8.1.1/lib/gcc/powerpc64le-unknown-linux-gnu/8.1.1/include/stdarg.h'
'/sw/summit/gcc/8.1.1/include/c++/8.1.1/stdlib.h'
'/sw/summit/gcc/8.1.1/include/c++/8.1.1/cstdlib'
.....
@hawkinsp is our toolchain not being used everywhere by chance?
[599 / 2,232] Executing genrule @local_config_cuda//cuda:cuda-include; 3s local, remote-cache ... (4 actions running)
ERROR: /gpfs/alpine/stf007/scratch/rprout/bazel-build-cache/user-root/b2ebe10a0ad0f6175e81a930563cb9d3/external/org_tensorflow/tensorflow/core/platform/BUILD:453:11: Compiling tensorflow/core/platform/path.cc failed: undeclared inclusion(s) in rule '@org_tensorflow//tensorflow/core/platform:path':
Are there different "rules" or something? undeclared inclusion(s) in rule '@org_tensorflow//tensorflow/core/platform:path':
@hawkinsp I did some digging around in my build-cache, here: /gpfs/alpine/stf007/scratch/rprout/bazel-build-cache/user-root/b2ebe10a0ad0f6175e81a930563cb9d3/execroot/__main__/external/
It looks like these external packages setup their own .bazelrc
and possibly don't get our toolchain config. Is there a guarantee that what we set as the toolchain config in JAX makes it to these external packages?
I was able to build from main on Traverse without having to do any toolchain modifications, though I did have to manually specify the cuda/cudnn paths.
python build/build.py --enable_cuda --cuda_path /usr/local/cuda-11.3 --cuda_version=11.3 --cudnn_version=8.2.0 --cudnn_path /usr/local/cudnn/cuda-11.3/8.2.0 --noenable_mkl_dnn --cuda_compute_capabilities 7.0 --bazel_path /usr/bin/bazel --target_cpu=ppc
Thanks so much for your help with this! Hope the other ppc users can also get it working
I'm trying to build jax on a cluster that uses IBM power9 processors (it's a sister cluster to Summit at ORNL). It seems to be failing when trying to build XLA, which is strange because I've been able to install tensorflow just fine. The full output log is here: https://gist.github.com/f0uriest/5f04e2ed9916bb750a9ea679633ac80c
Any ideas? Is there any plan to offer pre-build wheels for ppc64le architecture?