tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone
https://tensorflow.org
Apache License 2.0
185.65k stars 74.18k forks source link

Cannot build TF 2.2 rc2 or rc3 on Windows #38712

Closed fcunilim closed 3 years ago

fcunilim commented 4 years ago

Please make sure that this is a build/installation issue. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:build_template

System information

Describe the problem

Provide the exact sequence of commands / steps that you executed before running into the problem

Plenty of disk space available (> 400GB). Using a 6GB RAM system.

The error I get is :

ERROR: C:/users/....../tensorflow/tensorflow/core/kernels/BUILD:1321:1: C++ compilation of rule '//tensorflow/core/kernels:tile_ops_gpu' failed (Exit 2)

Any other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

fcunilim commented 4 years ago

More log information below:

C:\users\fred_bazel_fred\yhpkrwbv\execroot\org_tensorflow\external\eigen_archive\Eigen\src/Core/DenseBase.h(541): error C2993: 'Derived' : type non conforme pour le paramètre de modèle sans type 'formal' C:\users\fred_bazel_fred\yhpkrwbv\execroot\org_tensorflow\external\eigen_archive\Eigen\src/Core/DenseBase.h(657): note: voir la référence à l'instanciation classe modèle 'Eigen::DenseBase' en cours de compilation C:\users\fred_bazel_fred\yhpkrwbv\execroot\org_tensorflow\external\eigen_archive\Eigen\src/Core/DenseBase.h(541): error C2993: 'Derived' : type non conforme pour le paramètre de modèle sans type 'formal' C:\users\fred_bazel_fred\yhpkrwbv\execroot\org_tensorflow\external\eigen_archive\Eigen\src/Core/DenseBase.h(541): error C2993: 'Derived' : type non conforme pour le paramètre de modèle sans type '__formal' C:\users\fred_bazel_fred\yhpkrwbv\execroot\org_tensorflow\external\eigen_archive\Eigen\src/Core/Select.h(155): error C2244: 'Eigen::DenseBase::select' : impossible de faire correspondre la définition de fonction avec une déclaration existante C:\users\fred_bazel_fred\yhpkrwbv\execroot\org_tensorflow\external\eigen_archive\Eigen\src/Core/Select.h(153): note: voir la déclaration de 'Eigen::DenseBase::select' C:\users\fred_bazel_fred\yhpkrwbv\execroot\org_tensorflow\external\eigen_archive\Eigen\src/Core/Select.h(155): note: définition C:\users\fred_bazel_fred\yhpkrwbv\execroot\org_tensorflow\external\eigen_archive\Eigen\src/Core/Select.h(155): note: 'const Eigen::Select<Derived,ElseDerived::ConstantReturnType,ElseDerived> Eigen::DenseBase::select(const ElseDerived::Scalar &,const Eigen::DenseBase &) const' C:\users\fred_bazel_fred\yhpkrwbv\execroot\org_tensorflow\external\eigen_archive\Eigen\src/Core/Select.h(155): note: déclarations existantes C:\users\fred_bazel_fred\yhpkrwbv\execroot\org_tensorflow\external\eigen_archive\Eigen\src/Core/Select.h(155): note: 'const Eigen::Select<Derived,std::_Select::_Apply<ElseDerived,ElseDerived>::ConstantReturnType,ElseDerived> Eigen::DenseBase::select(const std::_Select::_Apply<ElseDerived,ElseDerived>::Scalar &,const Eigen::DenseBase &) const' C:\users\fred_bazel_fred\yhpkrwbv\execroot\org_tensorflow\external\eigen_archive\Eigen\src/Core/Select.h(155): note: 'const Eigen::Select<Derived,ThenDerived,ThenDerived::ConstantReturnType> Eigen::DenseBase::select(const Eigen::DenseBase &,const ThenDerived::Scalar &) const' C:\users\fred_bazel_fred\yhpkrwbv\execroot\org_tensorflow\external\eigen_archive\Eigen\src/Core/Select.h(155): note: 'const Eigen::Select<Derived,ThenDerived,ElseDerived> Eigen::DenseBase::select(const Eigen::DenseBase &,const Eigen::DenseBase &) const'

fcunilim commented 4 years ago

It looks a little bit this like problem (not sure, though):

https://github.com/pytorch/pytorch/issues/25393

ahtik commented 4 years ago

@fcunilim Could it be the issue of locating proper CUDA or VS build tools? Both rc builds worked for my Windows setup. I'm doing something like this, I know it's very likely excessive but at least works. Ignore the cuda being 10.2. It also works with 10.1. There's also possibly a better flag instead of TF_VC_VERSION for now, to make sure the build time is reasonable. btw, to speed up the build you might want to tweak the .bazelrc (see https://github.com/tensorflow/tensorflow/issues/38174#issuecomment-613722488).

SET TF_CUDA_COMPUTE_CAPABILITIES=7.5
SET TF_NEED_CUDA=1
SET BAZEL_VC=C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC
SET BAZEL_VS=C:\Program Files (x86)\Microsoft Visual Studio\2019\Community
SET CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2
SET CUDA_TOOLKIT_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2
SET PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\libnvvp;%PATH%

# IMPORTANT TO KEEP BUILD TIME REASONABLE
SET TF_VC_VERSION=16.4

bazel build --config=opt --define=no_tensorflow_py_deps=true //tensorflow/tools/pip_package:build_pip_package
bazel-bin\tensorflow\tools\pip_package\build_pip_package C:/tmp/tensorflow_cuda_10_2

EDIT: Ohh, just noticed your python version 3.6.8. I've only built with 3.7 and 3.8.

fcunilim commented 4 years ago

I wonder if disabling the eigen compilation speedup flag would help? There is an option which is set by default during configure (something about inlining).

ahtik commented 4 years ago

"eigen compilation speedup" flag is enabled/default for me and still builds fine. Would it be an option to migrate to python 3.7 or 3.8 to have less differences with my env? My MSVC is currently 14.24.28314 but can try with an upgraded version. How long it takes until you get to the failure?

fcunilim commented 4 years ago

It takes about 3 hours, on a 4-core 2009 CPU. I see people speculate this bug could indeed be non-deterministic? It is odd I didn't get the issue before, I compiled TF 2.1 three times last week with no problem.

ahtik commented 4 years ago

I upgraded my MSVC to the latest, same as you have, 14.25.28610 (part of VS 16.5.4) and will see if it builds.

fcunilim commented 4 years ago

I ran the build a second time, and got the same error.

ahtik commented 4 years ago

Is the nvcc --version output the following for CUDA 10.2?

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:32:27_Pacific_Daylight_Time_2019
Cuda compilation tools, release 10.2, V10.2.89

For CUDA 10.1 it would be:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:12:52_Pacific_Daylight_Time_2019
Cuda compilation tools, release 10.1, V10.1.243

Any chance you could try with python 3.8 or 3.7? Not sure it matters much but for the sake of "fun".


\python38\python -m venv \tmp\venv-tf2build
\tmp\venv-tf2build\Scripts\activate
python -m pip install --upgrade pip
pip install six numpy wheel keras_applications keras_preprocessing --no-deps
``` etc
fcunilim commented 4 years ago

nvcc version is 10.2.89.

There is something else that I have noticed twice lately, which is suspicious. Somehow, when I build sometimes the build process is stuck, waiting for "something".

I launched a build again a few minutes ago (just reissued the bazel build command showed above). I then switched to web browsing for a few minutes, then noticed the build process was stuck:

[260 / 2,084] 4 actions running
    Compiling tensorflow/core/kernels/training_ops.cc; 128s local
    Compiling tensorflow/core/kernels/scatter_nd_op_gpu.cu.cc; 96s local
    Compiling tensorflow/core/kernels/inplace_ops_functor_gpu.cu.cc; 43s local
    Compiling tensorflow/core/kernels/random_op_gpu.cu.cc; 16s local

By stuck I mean, not making any kind of progress. Yesterday the build was stuck at some point too.

In both instances, pressing the enter key triggered something, lots of things scroll through the console window, and the build resumes as normal.

This issue may not be related at all. Might have something to do with my internet connection or something... but I wonder if there is not something fishy with nvcc?

ahtik commented 4 years ago

"Unfortunately" my Windows build passed now with cuda 10.2.89, msvc 14.25.28610, pyton 3.8, so I don't know what else to check. My build machine does have a GPU but don't think it matters for the build process as long as the cuda and cudnn libs are available.

INFO: Elapsed time: 5546.311s, Critical Path: 379.29s
INFO: 8671 processes: 8671 local.
INFO: Build completed successfully, 12541 total actions

re: "stuck". It is possibly a different case as enter is doing something, but often the being "stuck" situation happens when TF_VC_VERSION=16.5 is not set and eigen inline is enabled (default). It can get "stuck" with different files each run, not a deterministic order. Also helps a lot if .bazelrc is optimal for the build hardware (basic example above).

A bit random thought, make sure your python is 64-bit (python -VV).

fcunilim commented 4 years ago

This could be because of the various flags and options. Forgive my ignorance, but all I do right now is, from a Windows shell:

What should I do exactly to apply your options? I am all a bit mixed up between environment variables, .bazelrc, explicit flags, etc. I fear I will do something incorrect (for your example above, do I have to type that in the shell? Do I still need to run configure.py? I would prefer to use configure.py)

If all of this doesn't work, I will install Python 3.7 (I can't install 3.8) and try again.

EDIT: my Python indeed is 64-bit.

ahtik commented 4 years ago

All the SET-prefixed commands adjust the active cmd.exe process env. So you'd run cmd.exe to start up the Windows shell. Then adjust the env that remains active only within this cmd.exe process (good for testing etc, I prefer this method over using Windows global settings to override env vars):

SET TF_CUDA_COMPUTE_CAPABILITIES=7.5
SET TF_NEED_CUDA=1
SET BAZEL_VC=C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC
SET BAZEL_VS=C:\Program Files (x86)\Microsoft Visual Studio\2019\Community
SET CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2
SET CUDA_TOOLKIT_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2
SET PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\libnvvp;%PATH%

# IMPORTANT TO KEEP BUILD TIME REASONABLE
SET TF_VC_VERSION=16.5

After the env is set, all is left is to activate your python env, making sure you have a good python that includes the required six numpy wheel keras_applications keras_preprocessing packages. See the last snippet in https://github.com/tensorflow/tensorflow/issues/38712#issuecomment-616665651 on how I manage python envs, this snippet also creates a new, so you can easily experiment with various python versions.

After the proper python env is active, cd to your tensorflow 2.2rc3 dir, verify that you have a good 64-bit version of python.exe (python -VV) and run the usual python configure.py command followed by the build and pip wheel build. I'm using these:

bazel build --config=opt --define=no_tensorflow_py_deps=true //tensorflow/tools/pip_package:build_pip_package
bazel-bin\tensorflow\tools\pip_package\build_pip_package C:/tmp/tensorflow_cuda_10_2

.bazelrc tweak is just a nice to have if you want to speed up the build/allow more cores and higher cpu priority to be used.

Is it more clear now? It's just the env vars and python env to set.

ahtik commented 4 years ago

I've started the build with the following setup, we'll see how that goes.

Bazel version 2.0.0. Python 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)]

\python36\python.exe -m venv \tmp\tf2build-env-py36
\tmp\tf2build-env-py36\Scripts\activate
python -m pip install --upgrade pip
pip install six numpy wheel keras_applications keras_preprocessing --no-deps
SET TF_CUDA_COMPUTE_CAPABILITIES=6.1
SET TF_NEED_CUDA=1
SET BAZEL_VC=C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC
SET BAZEL_VS=C:\Program Files (x86)\Microsoft Visual Studio\2019\Community
SET CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2
SET CUDA_TOOLKIT_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2
SET PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\libnvvp;%PATH%
SET TF_VC_VERSION=16.5
git checkout v2.2.0-rc3
bazel clean --expunge
python ./configure.py
bazel build --config=opt --define=no_tensorflow_py_deps=true //tensorflow/tools/pip_package:build_pip_package
fcunilim commented 4 years ago

Does not compile for me. Still using Python 3.6.8, but with your build method.

Now I get the following error:

ERROR: C:/users/.../tensorflow/tensorflow/core/BUILD:2176:1: ProtoCompile tensorflow/core/protobuf/error_codes.pb.h failed (Exit -1073741795)

ahtik commented 4 years ago

How long into the build? Any chance you can provide full output? Any ideas what exactly from my tweaks caused the change in error?

It might be worth trying to move the tf checkout closer to the c:\ root dir to make sure the c:/users/... is not triggering some Windows path length issues. Just a thought. I started the py3.6 build this morning and still runs, will update as it completes.

ahtik commented 4 years ago

py36 build for cuda 10.2, tf2.2rc3 was fine

INFO: Elapsed time: 2317.204s, Critical Path: 337.36s
INFO: 3890 processes: 3890 local.
INFO: Build completed successfully, 4394 total actions

Also the built wheel works fine. The only warning with the wheel install is tensorboard 2.2.1 has requirement setuptools>=41.0.0, but you'll have setuptools 40.6.2 which is incompatible.

I uploaded the wheel to https://drive.google.com/uc?id=1dpTFQcBl0AWMeo5zcCWv47tpeOegEeoB&export=download (keeping it there for max 24h), use at your own risk.

Unfortunately no idea what else to debug, if you have specific questions, feel free to ask, I can check with my setup.

fcunilim commented 4 years ago

The ProtoCompile error is also referenced here (when building TF 2.1 on Windows), though it concerns another file:

https://github.com/tensorflow/tensorflow/issues/38413

fcunilim commented 4 years ago

I tried with the MAX_PATH limit override, which was not active in my setup, and I get a slightly different error.

ERROR: C:/users/fred/documents/repos/tf22rc3/tensorflow/tensorflow/stream_executor/BUILD:425:1: ProtoCompile tensorflow/stream_executor/dnn.pb.h failed (Exit -1073741795)

I am completely puzzled, completely.

fcunilim commented 4 years ago

The difference between your configuration and mine is that I am using VS Build Tools 2019, not VS Community 2019. I see that with your environment variables. I did make the change to match my setup. Could this be an issue ?

I am running a TF 2.1 build to see if this still builds fine, using bazel 0.27.1. That kind of build did succeed multiple times two weeks ago. This should rule out possible "my setup changed in a bad way" scenarios.

On a separate computer, will try building TF 2.2 rc3 in a short time with:

ahtik commented 4 years ago

@fcunilim To add some color to your experiments, I did run an rc3 build now with the following in a dir C:\asers\fred\documents\repos\tf22rc3 using Python 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)], CUDA 10.2 and Build Tools.

SET BAZEL_VC=C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC
SET BAZEL_VS=C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools
SET TF_VC_VERSION=16.5
SET TF_CUDA_COMPUTE_CAPABILITIES=6.1
SET TF_NEED_CUDA=1
SET CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2
SET CUDA_TOOLKIT_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2
SET PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\libnvvp;%PATH%
bazel clean --expunge
python configure.py

bazel build --config=opt --define=no_tensorflow_py_deps=true //tensorflow/tools/pip_package:build_pip_package build completed successfully...

INFO: Elapsed time: 5108.775s, Critical Path: 350.36s
INFO: 8671 processes: 8671 local.
INFO: Build completed successfully, 12541 total actions

Might be not worth saying but also make sure your python env is as clean as possible. I can try this also on a laptop without a GPU, but not sure I can procrastinate with my other tasks to that extent :) If you find a solution then please keep this issue updated, would be good to know.

pip freeze:

Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
numpy==1.18.3
six==1.14.0
fcunilim commented 4 years ago

On my build computer, I decided to uninstall Python 3.6.8, install version 3.7.7 and try a new build.

This time, the error I am getting is:

ERROR: C:/users/fred/documents/repos/tf22rc3/tensorflow/tensorflow/core/BUILD:1710:1: ProtoCompile tensorflow/core/protobuf/conv_autotuning.pb.h failed (Exit -1073741795)

Things clearly look fishy, non-deterministic. Will have a look at the second machine in a little while.

ahtik commented 4 years ago

Being non-deterministic is normal, it's because of the multiple threads running the compilation in parallel.

Any chance you could share longer log of that error?

To confirm, I presume you're running the build from a cmd.exe shell, not from msys or PowerShell.

Btw, installing various python versions in parallel is perfectly fine in Win. Just install the other version without adjusting any of the paths and for "all users", so it can be installed to somewhere like c:\python\3.7 etc. After that creating a new venv based on that python version is as simple as \python\3.7\python -m venv c:\whenevermynewenvis' followed byc:\whenevermynewenvis\Scripts\activate` to use it in an active cmd.exe.

fcunilim commented 4 years ago

I ran the build again, from scratch, on the same machine:

/usr/bin/bash: line 1: 1233 Illegal instruction bazel-out/x64_windows-opt/bin/external/nasm/nasm -fwin64 -DWIN64 -D__x86_64__ -I $(dirname external/libjpeg_turbo/simd/x86_64/jccolext-sse2.asm)/ -I $(dirname external/libjpeg_turbo/simd/nasm/jdct.inc)/ -I $(dirname external/libjpeg_turbo/simd/nasm/jdct.inc)/../../win/ -o $out $(dirname external/libjpeg_turbo/simd/x86_64/jccolext-sse2.asm)/$(basename ${out%.obj}.asm) Target //tensorflow/tools/pip_package:build_pip_package failed to build ERROR: C:/users/fred/_bazel_fred/2h2pzguu/external/libjpeg_turbo/BUILD.bazel:347:1 Executing genrule @libjpeg_turbo//:simd_win_x86_64_assemble failed (Illegal instruction): bash.exe failed: error executing command

mihaimaruseac commented 4 years ago

Does your CPU support SSE2?

Edit: this probably should not matter as you were able to build 2.1

ahtik commented 4 years ago

@fcunilim whats your output from gcc -march=native -Q --help=target | grep enabled ?

fcunilim commented 4 years ago

My CPU is a Core i5 750. 6GB of RAM. Disk space is plentyful. Before I did a bazel clean --expunge, I had even deleted the _bazel_fred folder to be on the safe side.

How can I install gcc on msys2? pacman -S gcc?

fcunilim commented 4 years ago

My CPU does not support AVX, and the default TF 2.2 build option is /arch:AVX it seems. Does this mean the computer doing the build needs to support AVX? Or just that the built wheel file will make use of AVX on the computer it is run on?

fcunilim commented 4 years ago

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is /arch:AVX]:

I am saying default here, so I believe /arch:AVX is used. Can that explain my error?

ahtik commented 4 years ago

Do you have SSE2 at least? Then that might help, see https://docs.microsoft.com/en-us/cpp/build/reference/arch-x86?view=vs-2019 for the options.

fcunilim commented 4 years ago

Restarting a build with

/arch:SSE2

as optimization flag.

ahtik commented 4 years ago

@fcunilim my attempt with "C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.25.28610\bin\Hostx64\x64\cl.exe" /arch:SSE2 is giving a warning for SSE2 being unknown (AVX and AVX2 are accepted), hope it's not the case for you and at least now it's clear where to dig.

fcunilim commented 4 years ago

@ahtik how could I successfully compile TF 2.1? The default build option for TF 2.1 also is /arch:AVX

fcunilim commented 4 years ago

The build on the other computer (with a CPU indeed supporting AVX) has made good progress. I will report back as soon as it either completes (which I hope) or fails.

fcunilim commented 4 years ago

On the i5 750 machine the build is still not successful:

ERROR: C:/users/fred/documents/repos/tf22rc3/tensorflow/tensorflow/core/kernels/BUILD:1321:1: C++ compilation of rule '//tensorflow/core/kernels:tile_ops_gpu' failed (Exit 2) cl : Ligne de commande warning D9002 : option '/arch:SSE2' inconnue ignorée Remarqueá: inclusion du fichierá: bazel-out/x64_windows-opt/bin/external/local_config_cuda/cuda/cuda/include\cuda_runtime.h [...]

Note the 'tile_ops_gpu' error again. (/arch:SSE2 is ignored as you said @ahtik , it does not look like it causes the error).

On the other machine, the build is still going on. Will have to see the outcome in the morning.

fcunilim commented 4 years ago

Here is the outcome:

In both cases:

The real-time protection of Windows Defender was disabled before starting the builds.

fcunilim commented 4 years ago

I confirm the TF 2.2 rc3 build fails again on the i5 750 computer, after about an hour (ProtoCompile error on yet another file, duration_pb2.py). I wanted to try yet another time to be sure Windows Defender was unrelated to the issue.

mihaimaruseac commented 4 years ago

So there is a regression. Can you try building rc0 too? We're trying to identify where the regression might have been introduced.

fcunilim commented 4 years ago

Very quickly, I got a build issue with rc0.

ERROR: C:/users/fred/documents/repos/tf22rc3/tensorflow/tensorflow/lite/toco/logging/BUILD:23:1: ProtoCompile tensorflow/lite/toco/logging/toco_conversion_log_pb2.py failed (Exit -1073741795) Target //tensorflow/tools/pip_package:build_pip_package failed to build INFO: Elapsed time: 504.213s, Critical Path: 19.76s INFO: 173 processes: 173 local. FAILED: Build did NOT complete successfully

EDIT: disregard the folder name 'tf22rc3', this is the tensorflow repo really, what I did:

git checkout v2.2.0-rc0 then applied @ahtik 's build procedure (including the bazel clean --expunge)

ahtik commented 4 years ago

-1073741795 / 0xc000001d / STATUS_ILLEGAL_INSTRUCTION errors seem to be all about "Illegal instruction", indicating arch instruction set issue somewhere, as also suspected by the @mihaimaruseac SSE2 comment...

When I look at the 64-bit cl.exe /arch flag then there doesn't even seem to be an option for targeting anything lower than SSE2. https://docs.microsoft.com/en-us/cpp/build/reference/arch-x64?view=vs-2019 I couldn't find the minimum cpu requirements for the Visual Studio BuildTools cl.exe compiler, but maybe they expect SSE2 minimum now?

mihaimaruseac commented 4 years ago

One more check and then we should be done, for now: can you try building at commit 6094289d90e69533fae5964ea221e57a7a78570e and if that still fails at its parent?

This seems to be the only change inlite/toco/logging between 2.1 and 2.2 (not to say that the failure is localized there)

fcunilim commented 4 years ago

(tf22p377) C:\Users\Fred\Documents\repos\tf22rc3\tensorflow>git checkout 6094289 Checking out files: 100% (1936/1936), done. Previous HEAD position was 3c1e8c0341 Merge pull request #37486 from tensorflow/mm-r2.2-debug-win-build HEAD is now at 6094289d90 [TF lite conversion logging] Implement a simple sanitizer to prune error message returned from MLIR.

ERROR: C:/users/fred/documents/repos/tf22rc3/tensorflow/tensorflow/core/framework/BUILD:1251:1: ProtoCompile tensorflow/core/framework/tensor_description_pb2.py failed (Exit -1073741795) Target //tensorflow/tools/pip_package:build_pip_package failed to build INFO: Elapsed time: 1386.651s, Critical Path: 20.23s INFO: 178 processes: 178 local. FAILED: Build did NOT complete successfully

ahtik commented 4 years ago

@mihaimaruseac might have proposed building from https://github.com/tensorflow/tensorflow/commit/f7896058b2c332bef81ed5860567b71c4f2ce10e as being 609... parent. @fcunilim maybe also run from that? Although your latest error is already from another source.

Is there a longer and more detailed error/stack available from that build console you could post?

fcunilim commented 4 years ago

(tf22p377) C:\Users\Fred\Documents\repos\tf22rc3\tensorflow>git branch

ERROR: C:/users/fred/_bazel_fred/2h2pzguu/external/com_google_protobuf/BUILD:759:1: ProtoCompile external/com_google_protobuf/python/google/protobuf/empty_pb2.py failed (Exit -1073741795) Target //tensorflow/tools/pip_package:build_pip_package failed to build INFO: Elapsed time: 1331.164s, Critical Path: 19.87s INFO: 181 processes: 181 local. FAILED: Build did NOT complete successfully

ahtik commented 4 years ago

I did improve my build steps now by adding

set CC_OPT_FLAGS="/favor:INTEL64"

(no idea how to pass an empty flag). This way python configure.py no longer asks for the compiler arg and defaults to using SSE2 instead of AVX.

For my cpu and also for your laptop i5-6300U, it's better to use set CC_OPT_FLAGS="/arch:AVX2". Just make sure to also use the wheel only on supported hardware.

i5 750 supports SSE4.2, so the default arg using SSE2 in cl.exe should be alright. Not sure where the regression comes from.

fcunilim commented 4 years ago

Using bazel build -s -c opt --config=opt ...

I get

SUBCOMMAND: # //tensorflow/core/profiler/protobuf:op_metrics_proto_genproto [action 'ProtoCompile tensorflow/core/profiler/protobuf/op_metrics.pb.h', configuration: 6610eac58701ff0155539c518aa44e6981931f5297dbadbfffee7021feb436ed] cd C:/users/fred/_bazel_fred/2h2pzguu/execroot/org_tensorflow SET CUDA_TOOLKIT_PATH=C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.2 SET PATH=C:\msys64\usr\bin;C:\msys64\bin;C:\Windows;C:\Windows\System32;C:\Windows\System32\WindowsPowerShell\v1.0 SET PYTHON_BIN_PATH=C:/Users/Fred/Documents/virtualenvs/tf22p377/Scripts/python.exe SET PYTHON_LIB_PATH=C:/Users/Fred/Documents/virtualenvs/tf22p377/lib/site-packages SET RUNFILES_MANIFEST_ONLY=1 SET TF2_BEHAVIOR=1 SET TF_CONFIGURE_IOS=0 SET TF_CUDA_COMPUTE_CAPABILITIES=6.1 SET TF_ENABLE_XLA=1 SET TF_NEED_CUDA=1 bazel-out/x64_windows-opt/bin/external/com_google_protobuf/protoc --cpp_out=bazel-out/x64_windows-opt/bin -I. -Iexternal/com_google_protobuf/src -Ibazel-out/x64_windows-opt/bin/external/com_google_protobuf/src tensorflow/core/profiler/protobuf/op_metrics.proto ERROR: C:/users/fred/documents/repos/tf22rc3/tensorflow/tensorflow/core/framework/BUILD:1251:1: ProtoCompile tensorflow/core/framework/tensor_description_pb2.py failed (Exit -1073741795) Target //tensorflow/tools/pip_package:build_pip_package failed to build INFO: Elapsed time: 1331.760s, Critical Path: 19.84s INFO: 172 processes: 172 local. FAILED: Build did NOT complete successfully

ahtik commented 4 years ago

@mihaimaruseac Could it be that the protobuf binary dependency got updated and that Windows build was built with AVX instructions enabled?

ahtik commented 4 years ago

@fcunilim Just noticed something at https://github.com/tensorflow/tensorflow/issues/22954#issuecomment-429595782

Maybe try building without --config=opt. As I see, you started to add the 'opt' after my build description and that might have derailed something. This opt comment was for TF 1.1, not sure if even still relevant today.

fcunilim commented 4 years ago

Not using --config=opt also results in /arch:AVX being proposed by default. Going to try two more things:

Note that /arch:AVX was used when I built TF 2.1...

fcunilim commented 4 years ago

(tf22p377) C:\Users\Fred\Documents\repos\tf22rc3\tensorflow>git branch

SET TF_CUDA_COMPUTE_CAPABILITIES=6.1 SET TF_NEED_CUDA=1 SET BAZEL_VC=C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC SET BAZEL_VS=C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools SET CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2 SET CUDA_TOOLKIT_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2 SET PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\libnvvp;%PATH% SET TF_VC_VERSION=16.5 bazel clean --expunge python ./configure.py bazel build --define=no_tensorflow_py_deps=true //tensorflow/tools/pip_package:build_pip_package

Build is successful on the i5 750. Using Python 3.7.7. Default compiler option /arch:AVX used.