Closed Expert73 closed 4 years ago
build cmd bazel --output_base=c:/bazel/output_dir/ build --config=mkl --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
@Expert73
Can you please provide us the error log. Thanks!
Tensorflow 2.1 latest (nightly) C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.24.28314\include\xtr1common(163): note: see reference to class template instantiation 'std::integral_constant<bool,false>' being compiled C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.24.28314\include\xtr1common(163): note: see reference to class template instantiation 'std::disjunction<_Traits...>' being compiled ERROR: C:/tensorflow/tensorflow/core/kernels/BUILD:8128:1: C++ compilation of rule '//tensorflow/core/kernels:mkl_aggregate_ops' failed (Exit 2) .\tensorflow/core/util/mkl_util.h(1253): error C2131: expression did not evaluate to a constant .\tensorflow/core/util/mkl_util.h(1252): note: failure was caused by a read of a variable outside its lifetime .\tensorflow/core/util/mkl_util.h(1252): note: see usage of 'dim' .\tensorflow/core/util/mkl_util.h(1254): error C2131: expression did not evaluate to a constant .\tensorflow/core/util/mkl_util.h(1252): note: failure was caused by a read of a variable outside its lifetime .\tensorflow/core/util/mkl_util.h(1252): note: see usage of 'dim' .\tensorflow/core/util/mkl_util.h(1256): error C3863: array type 'dnnl_dim_t [kNumDims]' is not assignable .\tensorflow/core/util/mkl_util.h(1257): error C3863: array type 'dnnl_dim_t [kNumDims]' is not assignable Target //tensorflow/tools/pip_package:build_pip_package failed to build INFO: Elapsed time: 2448.446s, Critical Path: 179.48s INFO: 4686 processes: 4686 local. FAILED: Build did NOT complete successfully
I think, problem now here DCHECK_EQ(dim.size(), strides.size());
const int kNumDims = dim.size(); mkldnn_dim_t input_dims[kNumDims]; mkldnn_dim_t input_strides[kNumDims]; for (int i = 0; i < kNumDims; ++i) { input_dims[i] = dim[i]; input_strides[i] = strides[i]; }
in old (work code) was: DCHECK_EQ(dim.size(), strides.size());
mkldnn_dim_t input_dims[dim.size()]; mkldnn_dim_t input_strides[dim.size()]; for (size_t i = 0; i < dim.size(); ++i) { input_dims[i] = dim[i]; input_strides[i] = strides[i]; }
what do you think? how to fix the problem?
I have the same problem with Tensorflow 2.2.0 and Visual Studio 2019. I solved the problem with a small patch, I don't know if a memory leak is possible or not like that.
#ifdef ENABLE_MKLDNN_V1
const int kNumDims = dim.size();
mkldnn_dim_t input_dims[kNumDims];
mkldnn_dim_t input_strides[kNumDims];
for (int i = 0; i < kNumDims; ++i) {
input_dims[i] = dim[i];
input_strides[i] = strides[i];
try {
mkldnn_memory_desc_init_by_strides(blocked_md, kNumDims, input_dims,
memory::convert_to_c(dtype),
input_strides);
}
......
become
#ifdef ENABLE_MKLDNN_V1
const int kNumDims = dim.size();
mkldnn_dim_t * input_dims = new mkldnn_dim_t[kNumDims];
mkldnn_dim_t * input_strides = new mkldnn_dim_t[kNumDims];
for (int i = 0; i < kNumDims; ++i) {
input_dims[i] = dim[i];
input_strides[i] = strides[i];
}
try {
mkldnn_memory_desc_init_by_strides(blocked_md, kNumDims, input_dims,
memory::convert_to_c(dtype),
input_strides);
delete[] input_dims;
delete[] input_strides;
} catch (mkldnn::error& e) {
delete[] input_dims;
delete[] input_strides;
return Status(error::Code::INTERNAL,
tensorflow::strings::StrCat(
"Failed to create blocked memory descriptor.",
"Status: ", e.status, ", message: ", e.message));
}
Now my compilation stop in /Eigen/src/Core/util/ReenableStupidWarnings.h I don't konw if there is a relation between my patch and this problem
edit 1:
No relation between the patch and my error, its a compilation error where
it can't cast fromvector<long int> to vector<int64_t>
I think this is a new problem
ERROR: C:/tensorflow/tensorflow/core/kernels/BUILD:7897:1: C++ compilation of rule '//tensorflow/core/kernels:mkl_conv_op' failed (Exit 2)
.\tensorflow/core/kernels/mkl_conv_ops.h(157): error C2679: binary '=': no operator found which takes a right-hand operand of type 'std::vector<long,std::allocator
i solved it too by changing the type from long int to memozy::dim, it seem cleaner for me tensorflow/core/kernels/mkl_conv_ops.h
-#define MKLDNN_SIZE_DTYPE long int
+#define MKLDNN_SIZE_DTYPE memory::dim
Now the compilation takes many hours but don't want to end :(
Compiling tensorflow/core/kernels/mkl_cwise_ops_common.cc; 9752s local
I have i7-9700K For examlpe for TF 2.1 mkl_cwise_ops_common.cc compiled 16000s Сomplete code build lasted 7-8 hours.
without mkl Сomplete code build lasted 1-1,5 hours.
Thanks, i was thinking it was a bug and stoped the compilation after 11000s :( I have a I7-9750H Laptop, i think it will take 12-15H then. I compile tensorflow for CUDA 10.2 but just add MKL for the fun. it's worth it or not ?
I also created a pull request for the change https://github.com/tensorflow/tensorflow/pull/37785
stable gain of 3-5% when training models on my cpu.
Thanks a lot!
New error in next step.
ERROR: C:/tensorflow/tensorflow/lite/python/optimize/BUILD:50:1: Linking of rule '//tensorflow/lite/python/optimize:_tensorflow_lite_wrap_calibration_wrapper.so' failed (Exit 1120)
LINK : warning LNK4044: unrecognized option '/ldl'; ignored
LINK : warning LNK4044: unrecognized option '/lm'; ignored
LINK : warning LNK4044: unrecognized option '/lpthread'; ignored
mklml.lib(mklml.dll) : warning LNK4006: NULL_IMPORT_DESCRIPTOR already defined in libiomp5md.lib(libiomp5md.dll); second definition ignored
Creating library bazel-out/x64_windows-opt/bin/tensorflow/lite/python/optimize/lib_tensorflow_lite_wrap_calibration_wrapper.so.ifso and object bazel-out/x64_windows-opt/bin/tensorflow/lite/python/optimize/lib_tensorflow_lite_wrap_calibration_wrapper.so.exp
LINK : warning LNK4217: symbol '?DEVICE_CPU@tensorflow@@3QEBDEB (char const * const tensorflow::DEVICE_CPU)' defined in 'libtensor.lo(types.o)' is imported by 'libarithmetic_optimizer.a(arithmetic_optimizer.o)' in function '"bool cdecl tensorflow::grappler::anonymous namespace'::NodeIsOnCpu(class tensorflow::NodeDef const &)" (?NodeIsOnCpu@?A0x53e44b13@grappler@tensorflow@@YA_NAEBVNodeDef@3@@Z)' LINK : warning LNK4286: symbol '?DEVICE_CPU@tensorflow@@3QEBDEB (char const * const tensorflow::DEVICE_CPU)' defined in 'libtensor.lo(types.o)' is imported by 'libmemory_optimizer.a(memory_optimizer.o)' LINK : warning LNK4286: symbol '?DEVICE_CPU@tensorflow@@3QEBDEB (char const * const tensorflow::DEVICE_CPU)' defined in 'libtensor.lo(types.o)' is imported by 'libpin_to_host_optimizer.a(pin_to_host_optimizer.o)' LINK : warning LNK4286: symbol '?DEVICE_CPU@tensorflow@@3QEBDEB (char const * const tensorflow::DEVICE_CPU)' defined in 'libtensor.lo(types.o)' is imported by 'libutils.a(utils.o)' LINK : warning LNK4286: symbol '?DEVICE_GPU@tensorflow@@3QEBDEB (char const * const tensorflow::DEVICE_GPU)' defined in 'libtensor.lo(types.o)' is imported by 'libutils.a(utils.o)' LINK : warning LNK4217: symbol '?DEVICE_GPU@tensorflow@@3QEBDEB (char const * const tensorflow::DEVICE_GPU)' defined in 'libtensor.lo(types.o)' is imported by 'libarithmetic_optimizer.a(arithmetic_optimizer.o)' in function '"private: bool __cdecl tensorflow::grappler::
anonymous namespace'::ReorderCastLikeAndValuePreserving::NodeIsOnCpuOrGpu(class tensorflow::NodeDef const )const " (?NodeIsOnCpuOrGpu@ReorderCastLikeAndValuePreserving@?A0x53e44b13@grappler@tensorflow@@AEBA_NPEBVNodeDef@4@@Z)'
LINK : warning LNK4286: symbol '?DEVICE_GPU@tensorflow@@3QEBDEB (char const const tensorflow::DEVICE_GPU)' defined in 'libtensor.lo(types.o)' is imported by 'libauto_mixed_precision.a(auto_mixed_precision.o)'
LINK : warning LNK4286: symbol '?DEVICE_GPU@tensorflow@@3QEBDEB (char const const tensorflow::DEVICE_GPU)' defined in 'libtensor.lo(types.o)' is imported by 'libmemory_optimizer.a(memory_optimizer.o)'
LINK : warning LNK4286: symbol '?DEVICE_GPU@tensorflow@@3QEBDEB (char const const tensorflow::DEVICE_GPU)' defined in 'libtensor.lo(types.o)' is imported by 'libpin_to_host_optimizer.a(pin_to_host_optimizer.o)'
LINK : warning LNK4217: symbol '?g_trace_level@internal@profiler@tensorflow@@3U?$atomic@H@std@@A (struct std::atomic
what might the error be related to?
I have a fix but it's not 100% clean. in the mkl-dnn/blob/master/src/cpu/jit_utils/, i copie the content of the sub folder jitprofiling. then i replace
#ifndef DNNL_ENABLE_JIT_PROFILING
#define DNNL_ENABLE_JIT_PROFILING 1
#endif
buy
#define DNNL_ENABLE_JIT_PROFILING 1
and
#include "jitprofiling/jitprofiling.h"
by
#include "jitprofiling.h"
i think there is no need to copy the content of the folder and edit the include just edit the define
#define DNNL_ENABLE_JIT_PROFILING 1
will make it work
mkl-dnn/blob/master/src/cpu/jit_utils/
where is it?
my bad, i gived you the github folder path. i deleted my installation but i think it's in bazel temp folderin a folder named mkl_dnn_v1 or samething like this. make a find jit_utils in the tensorflow folder, windows will find it
edit 1 :
tensorflow\bazel-tensorflow\external\mkl_dnn_v1\src\cpu\jit_utils
khaled-besrour, thanks! all work!
@Expert73
Please close this thread if it solves your question. Thanks!
System information
Describe the problem At the initial stage of the build, an error appears.
C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.24.28314\include\xtr1common(163): note: see reference to class template instantiation 'std::integral_constant<bool,false>' being compiled C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.24.28314\include\xtr1common(163): note: see reference to class template instantiation 'std::disjunction<_Traits...>' being compiled ERROR: C:/tensorflow/tensorflow/core/kernels/BUILD:8128:1: C++ compilation of rule '//tensorflow/core/kernels:mkl_aggregate_ops' failed (Exit 2) .\tensorflow/core/util/mkl_util.h(1253): error C2131: expression did not evaluate to a constant .\tensorflow/core/util/mkl_util.h(1252): note: failure was caused by a read of a variable outside its lifetime .\tensorflow/core/util/mkl_util.h(1252): note: see usage of 'dim' .\tensorflow/core/util/mkl_util.h(1254): error C2131: expression did not evaluate to a constant .\tensorflow/core/util/mkl_util.h(1252): note: failure was caused by a read of a variable outside its lifetime .\tensorflow/core/util/mkl_util.h(1252): note: see usage of 'dim' .\tensorflow/core/util/mkl_util.h(1256): error C3863: array type 'dnnl_dim_t [kNumDims]' is not assignable .\tensorflow/core/util/mkl_util.h(1257): error C3863: array type 'dnnl_dim_t [kNumDims]' is not assignable Target //tensorflow/tools/pip_package:build_pip_package failed to build INFO: Elapsed time: 2448.446s, Critical Path: 179.48s INFO: 4686 processes: 4686 local. FAILED: Build did NOT complete successfully