tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone
https://tensorflow.org
Apache License 2.0
185.89k stars 74.24k forks source link

The Windows MSVC to clang migration linker issue #60136

Open shangerxin opened 1 year ago

shangerxin commented 1 year ago
Click to expand! ### Issue Type Bug ### Have you reproduced the bug with TF nightly? Yes ### Source source ### Tensorflow Version tf 2.12 ### Custom Code No ### OS Platform and Distribution Microsoft Windows Server 2019 Datacenter ### Mobile device _No response_ ### Python version Python 3.10 ### Bazel version 5.3.0 ### GCC/Compiler version _No response_ ### CUDA/cuDNN version _No response_ ### GPU model and memory _No response_ ### Current Behaviour? #### Background We have switch the TensorFlow compilation from MSVC to clang-cl and resolved two compile errors. The compilation is complete but there is a link issue before we can complete package TensorFlow to wheel. #### How to reproduce 1. Fix the const expression error a. Add file to third_party\tf_runtime_clangcl.patch with content ```shell diff --git a/include/tfrt/support/std_mutex.h b/include/tfrt/support/std_mutex.h index 6238d097..9fb24279 100644 --- a/include/tfrt/support/std_mutex.h +++ b/include/tfrt/support/std_mutex.h @@ -50,7 +50,7 @@ class TFRT_CAPABILITY("mutex") mutex { private: friend class mutex_lock; - std::mutex mu_; + std::mutex mu_{}; }; // Wrap std::unique_lock with support for thread annotations. ``` b. Update workspace file at third_party\tf_runtime\workspace.bzl at line 19. ```shell patch_file = ["//third_party:tf_runtime_clangcl.patch"], ``` 2. Bypass the Google ABSL compilation error a. Create file at third_party\absl\comd_google_absl_remove_static_assert.patch ```shell diff --git a/absl/meta/type_traits.h b/absl/meta/type_traits.h index d886cb30..819f87b4 100644 --- a/absl/meta/type_traits.h +++ b/absl/meta/type_traits.h @@ -495,9 +495,7 @@ struct is_trivially_copy_assignable absl::is_copy_assignable::value> { #ifdef ABSL_HAVE_STD_IS_TRIVIALLY_ASSIGNABLE private: - static constexpr bool compliant = - std::is_trivially_copy_assignable::value == - is_trivially_copy_assignable::value; + static constexpr bool compliant = true; static_assert(compliant || std::is_trivially_copy_assignable::value, "Not compliant with std::is_trivially_copy_assignable; " "Standard: false, Implementation: true"); ``` b. Modify the file at third_party\absl\workspace.bzl at line 46 ```shell patch_file = ["//third_party/absl:com_google_absl_fix_mac_and_nvcc_build.patch", "//third_party/absl:comd_google_absl_remove_static_assert.patch"], ``` ### Standalone code to reproduce the issue #### Build with command ```shell /> bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package --verbose_failures --compiler=clang-cl ``` ### Relevant log output ```shell \tensorflow>bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package --verbose_failures --compiler=clang-cl --copt=/clang:-Weverything --config=windows WARNING: Ignoring JAVA_HOME, because it must point to a JDK, not a JRE. WARNING: The following configs were expanded more than once: [monolithic]. For repeatable flags, repeats are counted twice and may lead to unexpected behavior. INFO: Options provided by the client: Inherited 'common' options: --isatty=1 --terminal_columns=189 INFO: Reading rc options for 'build' from d:\...\msvc_to_clang\tensorflow\.bazelrc: Inherited 'common' options: --experimental_repo_remote_exec INFO: Options provided by the client: 'build' options: --python_path=D:/.../msvc_to_clang/venv310/Scripts/python.exe INFO: Reading rc options for 'build' from d:\...\msvc_to_clang\tensorflow\.bazelrc: 'build' options: --define framework_shared_object=true --define tsl_protobuf_header_only=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library --experimental_link_static_libraries_once=false --incompatible_enforce_config_setting_visibility INFO: Reading rc options for 'build' from d:\...\msvc_to_clang\tensorflow\.tf_configure.bazelrc: 'build' options: --action_env PYTHON_BIN_PATH=D:/.../msvc_to_clang/venv310/Scripts/python.exe --action_env PYTHON_LIB_PATH=D:/.../msvc_to_clang/venv310/lib/site-packages --python_path=D:/.../msvc_to_clang/venv310/Scripts/python.exe --copt=/d2ReducedOptimizeHugeFunctions --host_copt=/d2ReducedOptimizeHugeFunctions --define=override_eigen_strong_inline=true --define=tf_use_clang_cl_instead_of_msvc=true INFO: Reading rc options for 'build' from d:\...\msvc_to_clang\tensorflow\.bazelrc: 'build' options: --deleted_packages=tensorflow/compiler/mlir/tfrt,tensorflow/compiler/mlir/tfrt/benchmarks,tensorflow/compiler/mlir/tfrt/jit/python_binding,tensorflow/compiler/mlir/tfrt/jit/transforms,tensorflow/compiler/mlir/tfrt/python_tests,tensorflow/compiler/mlir/tfrt/tests,tensorflow/compiler/mlir/tfrt/tests/ir,tensorflow/compiler/mlir/tfrt/tests/analysis,tensorflow/compiler/mlir/tfrt/tests/jit,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_tfrt,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_jitrt,tensorflow/compiler/mlir/tfrt/tests/tf_to_corert,tensorflow/compiler/mlir/tfrt/tests/tf_to_tfrt_data,tensorflow/compiler/mlir/tfrt/tests/saved_model,tensorflow/compiler/mlir/tfrt/transforms/lhlo_gpu_to_tfrt_gpu,tensorflow/core/runtime_fallback,tensorflow/core/runtime_fallback/conversion,tensorflow/core/runtime_fallback/kernel,tensorflow/core/runtime_fallback/opdefs,tensorflow/core/runtime_fallback/runtime,tensorflow/core/runtime_fallback/util,tensorflow/core/tfrt/eager,tensorflow/core/tfrt/eager/backends/cpu,tensorflow/core/tfrt/eager/backends/gpu,tensorflow/core/tfrt/eager/core_runtime,tensorflow/core/tfrt/eager/cpp_tests/core_runtime,tensorflow/core/tfrt/gpu,tensorflow/core/tfrt/run_handler_thread_pool,tensorflow/core/tfrt/runtime,tensorflow/core/tfrt/saved_model,tensorflow/core/tfrt/graph_executor,tensorflow/core/tfrt/saved_model/tests,tensorflow/core/tfrt/tpu,tensorflow/core/tfrt/utils INFO: Found applicable config definition build:short_logs in file d:\...\msvc_to_clang\tensorflow\.bazelrc: --output_filter=DONT_MATCH_ANYTHING INFO: Found applicable config definition build:v2 in file d:\...\msvc_to_clang\tensorflow\.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1 INFO: Found applicable config definition build:opt in file d:\...\msvc_to_clang\tensorflow\.tf_configure.bazelrc: --copt=/arch:AVX --host_copt=/arch:AVX INFO: Found applicable config definition build:windows in file d:\...\msvc_to_clang\tensorflow\.bazelrc: --copt=/W0 --host_copt=/W0 --copt=/Zc:__cplusplus --host_copt=/Zc:__cplusplus --copt=/D_USE_MATH_DEFINES --host_copt=/D_USE_MATH_DEFINES --features=compiler_param_file --copt=/d2ReducedOptimizeHugeFunctions --host_copt=/d2ReducedOptimizeHugeFunctions --cxxopt=/std:c++17 --host_cxxopt=/std:c++17 --config=monolithic --copt=-DWIN32_LEAN_AND_MEAN --host_copt=-DWIN32_LEAN_AND_MEAN --copt=-DNOGDI --host_copt=-DNOGDI --copt=/Zc:preprocessor --host_copt=/Zc:preprocessor --linkopt=/DEBUG --host_linkopt=/DEBUG --linkopt=/OPT:REF --host_linkopt=/OPT:REF --linkopt=/OPT:ICF --host_linkopt=/OPT:ICF --verbose_failures --features=compiler_param_file INFO: Found applicable config definition build:monolithic in file d:\...\msvc_to_clang\tensorflow\.bazelrc: --define framework_shared_object=false --define tsl_protobuf_header_only=false --experimental_link_static_libraries_once=false INFO: Found applicable config definition build:windows in file d:\...\msvc_to_clang\tensorflow\.bazelrc: --copt=/W0 --host_copt=/W0 --copt=/Zc:__cplusplus --host_copt=/Zc:__cplusplus --copt=/D_USE_MATH_DEFINES --host_copt=/D_USE_MATH_DEFINES --features=compiler_param_file --copt=/d2ReducedOptimizeHugeFunctions --host_copt=/d2ReducedOptimizeHugeFunctions --cxxopt=/std:c++17 --host_cxxopt=/std:c++17 --config=monolithic --copt=-DWIN32_LEAN_AND_MEAN --host_copt=-DWIN32_LEAN_AND_MEAN --copt=-DNOGDI --host_copt=-DNOGDI --copt=/Zc:preprocessor --host_copt=/Zc:preprocessor --linkopt=/DEBUG --host_linkopt=/DEBUG --linkopt=/OPT:REF --host_linkopt=/OPT:REF --linkopt=/OPT:ICF --host_linkopt=/OPT:ICF --verbose_failures --features=compiler_param_file INFO: Found applicable config definition build:monolithic in file d:\...\msvc_to_clang\tensorflow\.bazelrc: --define framework_shared_object=false --define tsl_protobuf_header_only=false --experimental_link_static_libraries_once=false INFO: Analyzed target //tensorflow/tools/pip_package:build_pip_package (594 packages loaded, 33130 targets configured). INFO: Found 1 target... ERROR: D:/.../msvc_to_clang/tensorflow/tensorflow/distribute/experimental/rpc/kernels/BUILD:60:21: Linking tensorflow/distribute/experimental/rpc/kernels/gen_gen_rpc_ops_py_wrappers_cc.exe failed: (Exit 1): lld-link.exe failed: error executing command cd /d C:/users/...sha/_bazel_...sha/mlvocmwh/execroot/org_tensorflow SET LIB=C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30133\lib\x64;C:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\lib\um\x64;C:\Program Files (x86)\Windows Kits\10\lib\10.0.19041.0\ucrt\x64;C:\Program Files (x86)\Windows Kits\10\lib\10.0.19041.0\um\x64;d:\...\msvc_to_clang\LLVM\lib\clang\15.0.6\lib\windows SET PATH=C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64;C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\Common7\IDE\VC\VCPackages;C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\Common7\IDE\CommonExtensions\Microsoft\TestWindow;C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\Common7\IDE\CommonExtensions\Microsoft\TeamFoundation\Team Explorer;C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Current\bin\Roslyn;C:\Program Files (x86)\Microsoft SDKs\Windows\v10.0A\bin\NETFX 4.8 Tools\x64\;C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\Common7\Tools\devinit;C:\Program Files (x86)\Windows Kits\10\bin\10.0.19041.0\x64;C:\Program Files (x86)\Windows Kits\10\bin\x64;C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\\MSBuild\Current\Bin;C:\Windows\Microsoft.NET\Framework64\v4.0.30319;C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\Common7\IDE\;C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\Common7\Tools\;;C:\Windows\system32;C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\Common7\IDE\CommonExtensions\Microsoft\CMake\CMake\bin;C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\Common7\IDE\CommonExtensions\Microsoft\CMake\Ninja SET PWD=/proc/self/cwd SET PYTHON_BIN_PATH=D:/.../msvc_to_clang/venv310/Scripts/python.exe SET PYTHON_LIB_PATH=D:/.../msvc_to_clang/venv310/lib/site-packages SET RUNFILES_MANIFEST_ONLY=1 SET TEMP=C:\Users\...sha\AppData\Local\Temp\2 SET TF2_BEHAVIOR=1 SET TMP=C:\Users\...sha\AppData\Local\Temp\2 d:\...\msvc_to_clang\LLVM\bin\lld-link.exe @bazel-out/x64_windows-opt/bin/tensorflow/distribute/experimental/rpc/kernels/gen_gen_rpc_ops_py_wrappers_cc.exe-2.params # Configuration: cfe8788e4ffcaa7fd26e4e99620edbfd250b250962129096b14aa1fc721dc89b # Execution platform: @local_execution_config_platform//:platform lld-link: warning: ignoring unknown argument '-lm' lld-link: warning: ignoring unknown argument '-lpthread' lld-link: warning: ignoring unknown argument '-lm' lld-link: warning: ignoring unknown argument '-lpthread' lld-link: warning: ignoring unknown argument '-lm' lld-link: warning: allocator_registry_impl.lo.lib(cpu_allocator_impl.obj): locally defined symbol imported: struct std::atomic tsl::profiler::internal::g_trace_level (defined in traceme_recorder_impl.lo.lib(traceme_recorder.obj)) [LNK4217] lld-link: warning: utils.lib(utils.obj): locally defined symbol imported: char const *const tensorflow::DEVICE_CPU (defined in tensor.lo.lib(types.obj)) [LNK4217] lld-link: warning: utils.lib(utils.obj): locally defined symbol imported: char const *const tensorflow::DEVICE_GPU (defined in tensor.lo.lib(types.obj)) [LNK4217] lld-link: warning: memory_optimizer.lib(memory_optimizer.obj): locally defined symbol imported: char const *const tensorflow::DEVICE_CPU (defined in tensor.lo.lib(types.obj)) [LNK4217] lld-link: warning: memory_optimizer.lib(memory_optimizer.obj): locally defined symbol imported: char const *const tensorflow::DEVICE_GPU (defined in tensor.lo.lib(types.obj)) [LNK4217] lld-link: warning: arithmetic_optimizer.lib(arithmetic_optimizer.obj): locally defined symbol imported: char const *const tensorflow::DEVICE_CPU (defined in tensor.lo.lib(types.obj)) [LNK4217] lld-link: warning: arithmetic_optimizer.lib(arithmetic_optimizer.obj): locally defined symbol imported: char const *const tensorflow::DEVICE_GPU (defined in tensor.lo.lib(types.obj)) [LNK4217] lld-link: warning: pin_to_host_optimizer.lib(pin_to_host_optimizer.obj): locally defined symbol imported: char const *const tensorflow::DEVICE_CPU (defined in tensor.lo.lib(types.obj)) [LNK4217] lld-link: warning: pin_to_host_optimizer.lib(pin_to_host_optimizer.obj): locally defined symbol imported: char const *const tensorflow::DEVICE_GPU (defined in tensor.lo.lib(types.obj)) [LNK4217] lld-link: warning: gpu_id_impl.lib(gpu_id_manager.obj): locally defined symbol imported: char const *const tensorflow::DEVICE_GPU (defined in tensor.lo.lib(types.obj)) [LNK4217] lld-link: warning: bfc_allocator.lib(bfc_allocator.obj): locally defined symbol imported: struct std::atomic tsl::profiler::internal::g_trace_level (defined in traceme_recorder_impl.lo.lib(traceme_recorder.obj)) [LNK4217] lld-link: warning: Pass.lib(pass.obj): locally defined symbol imported: char const *const tensorflow::DEVICE_CPU (defined in tensor.lo.lib(types.obj)) [LNK4217] lld-link: warning: Pass.lib(pass.obj): locally defined symbol imported: char const *const tensorflow::DEVICE_GPU (defined in tensor.lo.lib(types.obj)) [LNK4217] lld-link: warning: pdll_utils.lib(utils.obj): locally defined symbol imported: char const *const tensorflow::DEVICE_CPU (defined in tensor.lo.lib(types.obj)) [LNK4217] lld-link: error: undefined symbol: struct mlir::LogicalResult __cdecl mlir::tfg::InferReturnTypeComponentsForTFOp(class std::optional, class mlir::Operation *, class mlir::ValueRange, __int64, class llvm::function_ref, class llvm::function_ref, class llvm::function_ref, class llvm::function_ref, class std::allocator>, class tensorflow::AttrValue> *)>, class llvm::SmallVectorImpl &) >>> referenced by Pass.lib(pass.obj):(public: __cdecl `public: virtual void __cdecl mlir::tfg::ShapeInference::runOnOperation(void)'::`1'::::operator()(class mlir::Operation *) const) Target //tensorflow/tools/pip_package:build_pip_package failed to build INFO: Elapsed time: 1397.717s, Critical Path: 378.14s INFO: 10556 processes: 821 internal, 9735 local. FAILED: Build did NOT complete successfully ```
vam-google commented 1 year ago

This specific issue is caused by OpRegistrationData type being defined as struct OpRegistrationData but declared in failing translation unit as class OpRegistrationData.

Clang apparently makes the difference here (unlike msvc) and it can't find the symbol with class attribute while it is defined only as struct.

Changing class OpRegistrationData to struct OpRegistrationData fixes this spcific issue, and build passes further, but then fails on linking _pywrap_internal .dll on bfloat16.so symbols duplicate issue. Trying to fix that now.

shangerxin commented 1 year ago

OK. I will pull the latest code and verified on my end too.