jax-ml / jax

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
http://jax.readthedocs.io/
Apache License 2.0
30.16k stars 2.76k forks source link

Linking errors when building on windows #14466

Closed cloudhan closed 1 year ago

cloudhan commented 1 year ago

Description

@hawkinsp Follow the fix in https://github.com/google/jax/issues/14369#issuecomment-1429839230

There will still be linking error as in https://github.com/google/jax/issues/14369#issuecomment-1428121867

If all those fixed, you will get another link error as follows:

LINK : warning LNK4044: unrecognized option \'/lm\'; ignored
ffi.lib(ffi.obj) : error LNK2005: "struct XLA_FFI_Stream * __cdecl xla::runtime::ffi::GetXlaFfiStream(class xla::runtime::PtrMapByType<class xla::runtime::CustomCall,16> const *,class xla::runtime::DiagnosticEngine const *)" (?GetXlaFfiStream@ffi@runtime@xla@@YAPEAUXLA_FFI_Stream@@PEBV?$PtrMapByType@VCustomCall@runtime@xla@@$0BA@@23@PEBVDiagnosticEngine@23@@Z) already defined in executable.lib(executable.obj)
   Creating library bazel-out/x64_windows-opt/bin/external/org_tensorflow/tensorflow/compiler/xla/python/xla_extension.so.if.lib and object bazel-out/x64_windows-opt/bin/external/org_tensorflow/tensorflow/compiler/xla/python/xla_extension.so.if.exp
LINK : warning LNK4286: symbol \'?g_trace_level@internal@profiler@tsl@@3U?$atomic@H@std@@A (struct std::atomic<int> tsl::profiler::internal::g_trace_level)\' defined in \'traceme_recorder_impl.lo.lib(traceme_recorder.obj)\' is imported by \'gpu_executable.lib(gpu_executable.obj)\'
LINK : warning LNK4286: symbol \'?g_trace_level@internal@profiler@tsl@@3U?$atomic@H@std@@A (struct std::atomic<int> tsl::profiler::internal::g_trace_level)\' defined in \'traceme_recorder_impl.lo.lib(traceme_recorder.obj)\' is imported by \'send_recv.lib(send_recv.obj)\'
LINK : warning LNK4286: symbol \'?g_trace_level@internal@profiler@tsl@@3U?$atomic@H@std@@A (struct std::atomic<int> tsl::profiler::internal::g_trace_level)\' defined in \'traceme_recorder_impl.lo.lib(traceme_recorder.obj)\' is imported by \'llvm_gpu_backend.lib(gpu_backend_lib.obj)\'
LINK : warning LNK4286: symbol \'?g_trace_level@internal@profiler@tsl@@3U?$atomic@H@std@@A (struct std::atomic<int> tsl::profiler::internal::g_trace_level)\' defined in \'traceme_recorder_impl.lo.lib(traceme_recorder.obj)\' is imported by \'transpose.lib(transpose.obj)\'
LINK : warning LNK4286: symbol \'?g_trace_level@internal@profiler@tsl@@3U?$atomic@H@std@@A (struct std::atomic<int> tsl::profiler::internal::g_trace_level)\' defined in \'traceme_recorder_impl.lo.lib(traceme_recorder.obj)\' is imported by \'bfc_allocator.lib(bfc_allocator.obj)\'
LINK : warning LNK4286: symbol \'?g_trace_level@internal@profiler@tsl@@3U?$atomic@H@std@@A (struct std::atomic<int> tsl::profiler::internal::g_trace_level)\' defined in \'traceme_recorder_impl.lo.lib(traceme_recorder.obj)\' is imported by \'nvptx_compiler_impl.lib(nvptx_compiler.obj)\'
LINK : warning LNK4286: symbol \'?g_trace_level@internal@profiler@tsl@@3U?$atomic@H@std@@A (struct std::atomic<int> tsl::profiler::internal::g_trace_level)\' defined in \'traceme_recorder_impl.lo.lib(traceme_recorder.obj)\' is imported by \'gpu_compiler.lib(gpu_compiler.obj)\'
LINK : warning LNK4286: symbol \'?g_trace_level@internal@profiler@tsl@@3U?$atomic@H@std@@A (struct std::atomic<int> tsl::profiler::internal::g_trace_level)\' defined in \'traceme_recorder_impl.lo.lib(traceme_recorder.obj)\' is imported by \'cpu_runtime.lib(cpu_runtime.obj)\'
LINK : warning LNK4286: symbol \'?g_trace_level@internal@profiler@tsl@@3U?$atomic@H@std@@A (struct std::atomic<int> tsl::profiler::internal::g_trace_level)\' defined in \'traceme_recorder_impl.lo.lib(traceme_recorder.obj)\' is imported by \'gpu_helpers.lib(gpu_helpers.obj)\'
LINK : warning LNK4286: symbol \'?g_trace_level@internal@profiler@tsl@@3U?$atomic@H@std@@A (struct std::atomic<int> tsl::profiler::internal::g_trace_level)\' defined in \'traceme_recorder_impl.lo.lib(traceme_recorder.obj)\' is imported by \'pjrt_stream_executor_client.lib(pjrt_stream_executor_client.obj)\'
LINK : warning LNK4286: symbol \'?g_trace_level@internal@profiler@tsl@@3U?$atomic@H@std@@A (struct std::atomic<int> tsl::profiler::internal::g_trace_level)\' defined in \'traceme_recorder_impl.lo.lib(traceme_recorder.obj)\' is imported by \'local_device_state.lib(local_device_state.obj)\'
LINK : warning LNK4286: symbol \'?g_trace_level@internal@profiler@tsl@@3U?$atomic@H@std@@A (struct std::atomic<int> tsl::profiler::internal::g_trace_level)\' defined in \'traceme_recorder_impl.lo.lib(traceme_recorder.obj)\' is imported by \'profiler.lib(profiler.obj)\'
LINK : warning LNK4286: symbol \'?g_trace_level@internal@profiler@tsl@@3U?$atomic@H@std@@A (struct std::atomic<int> tsl::profiler::internal::g_trace_level)\' defined in \'traceme_recorder_impl.lo.lib(traceme_recorder.obj)\' is imported by \'outfeed_receiver.lib(outfeed_receiver.obj)\'
LINK : warning LNK4286: symbol \'?g_trace_level@internal@profiler@tsl@@3U?$atomic@H@std@@A (struct std::atomic<int> tsl::profiler::internal::g_trace_level)\' defined in \'traceme_recorder_impl.lo.lib(traceme_recorder.obj)\' is imported by \'py_client.lib(py_values.obj)\'
LINK : warning LNK4286: symbol \'?g_trace_level@internal@profiler@tsl@@3U?$atomic@H@std@@A (struct std::atomic<int> tsl::profiler::internal::g_trace_level)\' defined in \'traceme_recorder_impl.lo.lib(traceme_recorder.obj)\' is imported by \'tfrt_cpu_pjrt_client.lib(tfrt_cpu_pjrt_client.obj)\'
LINK : warning LNK4217: symbol \'?g_trace_level@internal@profiler@tsl@@3U?$atomic@H@std@@A (struct std::atomic<int> tsl::profiler::internal::g_trace_level)\' defined in \'traceme_recorder_impl.lo.lib(traceme_recorder.obj)\' is imported by \'allocator_registry_impl.lo.lib(cpu_allocator_impl.obj)\' in function \'"public: static void __cdecl tsl::profiler::TraceMe::InstantActivity<class <lambda_29f743e77e718fe99c3f5b22e598e942>,1>(class <lambda_29f743e77e718fe99c3f5b22e598e942> &&,int)" (??$InstantActivity@V<lambda_29f743e77e718fe99c3f5b22e598e942>@@$00@TraceMe@profiler@tsl@@SAX$$QEAV<lambda_29f743e77e718fe99c3f5b22e598e942>@@H@Z)\'
LINK : warning LNK4286: symbol \'?g_trace_level@internal@profiler@tsl@@3U?$atomic@H@std@@A (struct std::atomic<int> tsl::profiler::internal::g_trace_level)\' defined in \'traceme_recorder_impl.lo.lib(traceme_recorder.obj)\' is imported by \'pmap_lib.lib(pmap_lib.obj)\'
LINK : warning LNK4286: symbol \'?g_trace_level@internal@profiler@tsl@@3U?$atomic@H@std@@A (struct std::atomic<int> tsl::profiler::internal::g_trace_level)\' defined in \'traceme_recorder_impl.lo.lib(traceme_recorder.obj)\' is imported by \'pjit.lib(pjit.obj)\'
LINK : warning LNK4286: symbol \'?g_trace_level@internal@profiler@tsl@@3U?$atomic@H@std@@A (struct std::atomic<int> tsl::profiler::internal::g_trace_level)\' defined in \'traceme_recorder_impl.lo.lib(traceme_recorder.obj)\' is imported by \'jax_jit.lib(jax_jit.obj)\'
LINK : warning LNK4217: symbol \'?g_annotation_enabled@internal@profiler@tsl@@3U?$atomic@H@std@@A (struct std::atomic<int> tsl::profiler::internal::g_annotation_enabled)\' defined in \'annotation_stack_impl.lo.lib(annotation_stack.obj)\' is imported by \'gpu_executable.lib(gpu_executable.obj)\' in function \'"public: __cdecl tsl::profiler::ScopedAnnotationT<0>::ScopedAnnotationT<0><class <lambda_aeb2b8c334a04b454d1eb165a0a6ffbd> >(class <lambda_aeb2b8c334a04b454d1eb165a0a6ffbd>)" (??$?0V<lambda_aeb2b8c334a04b454d1eb165a0a6ffbd>@@@?$ScopedAnnotationT@$0A@@profiler@tsl@@QEAA@V<lambda_aeb2b8c334a04b454d1eb165a0a6ffbd>@@@Z)\'
LINK : warning LNK4286: symbol \'?g_annotation_enabled@internal@profiler@tsl@@3U?$atomic@H@std@@A (struct std::atomic<int> tsl::profiler::internal::g_annotation_enabled)\' defined in \'annotation_stack_impl.lo.lib(annotation_stack.obj)\' is imported by \'gpu_executable.lib(sequential_thunk.obj)\'
LINK : warning LNK4286: symbol \'?g_annotation_enabled@internal@profiler@tsl@@3U?$atomic@H@std@@A (struct std::atomic<int> tsl::profiler::internal::g_annotation_enabled)\' defined in \'annotation_stack_impl.lo.lib(annotation_stack.obj)\' is imported by \'tracing.lib(tracing.obj)\'
bazel-out\\x64_windows-opt\\bin\\external\\org_tensorflow\\tensorflow\\compiler\\xla\\python\\xla_extension.so : fatal error LNK1169: one or more multiply defined symbols found

Cause of multiply defined tsl::profiler::internal::g_annotation_enabled

TF_COMPILE_LIBRARY is not properly defined https://github.com/tensorflow/tensorflow/blob/959d1b144dc03fbda586f8a60dd4c117025e6c18/tensorflow/tsl/platform/macros.h#L61-L69 which if further caused by parameter is_external of macro get_win_copts is not properly propagated in, in https://github.com/tensorflow/tensorflow/blob/959d1b144dc03fbda586f8a60dd4c117025e6c18/tensorflow/tensorflow.bzl

Fortunately, we can --copts=/DTF_COMPILE_LIBRARY to workaround this problem.

Cause of already defined xla::runtime::ffi::GetXlaFfiStream

xla::runtime::ffi::GetXlaFfiStream is first defined as weak symbol in ffi.cc then defined as normal symbol in executable.cc and msvc does not support weak symbol.

As for dirty fix, removing the definition in ffi.cc and replacing it with decl will allow the linking to pass. After this, we can build a usable jaxlib whl

What jax/jaxlib version are you using?

jaxlib v0.4.3

Which accelerator(s) are you using?

No response

Additional system info

No response

NVIDIA GPU info

No response

adam-hartshorne commented 1 year ago

Seems like Triton has been changed such that the advice above is no longer valid (build will still fail for similar reasons).

hawkinsp commented 1 year ago

Are both of these issues still current, or just the GetXlaFfiStream issue?

hawkinsp commented 1 year ago

@ezhulenev do you have suggestions on how we might avoid the weak symbol here? MSVC apparently does not support weak symbols.

cloudhan commented 1 year ago

Still current. But the first one is mainly configuration issue. So only the second one need to be address ATM.

cloudhan commented 1 year ago

The weak symbol problem has been resolved on openxla/xla latest main.

hawkinsp commented 1 year ago

Is this issue still a problem?

(I know it isn't an CPU, because the new Windows CPU CI build is mostly happy: https://github.com/google/jax/actions/workflows/windows_ci.yml )

cloudhan commented 1 year ago

The weak symbol problem has been resolved on openxla/xla latest main.

That is, the second is fixed in https://github.com/openxla/xla/commit/e634d4ab1067445c0f89f4b2bbe0bafaf0400051

The first one is still there, but as it can be workaround from outside, feel free to close this issue.