Closed AFAgarap closed 8 years ago
Already fixed this by download the tar.gz
file. The problem now is when I'm doing bazel build
, my computer freezes.
Which tar.gz file exactly?
Do you have bazel 0.3.2 by the way? Might be related to this bug: https://github.com/bazelbuild/bazel/issues/1685
Thanks, @gokceneraslan
Replacing Bazel 0.3.0 with 0.3.2 resolves this issue on my side.
My system info: Nvidia driver: 367.48-0ubuntu1 CUDA: cuda-repo-ubuntu1604_8.0.44-1_amd64.deb cuDNN: cuDNN 5.1
@AFAgarap, did you try upgrading to 0.3.2 as @willSapgreen suggests? Thanks!
@gokceneraslan The 0.11rc1 tar.gz file. @aselle I have 0.3.1, and yes, I shall upgrade to 0.3.2. Will get back to you on this. Thanks!
@aselle @willSapgreen @gokceneraslan I have upgraded bazel to 0.3.2, and yet, when I'm doing bazel build
, my computer still crashes. It has the following hardware specs: Intel Core i5-6300HQ (2.3GHz to 3.2GHz), 8GB DDR3 RAM 1600MHz, 1TB Hybrid Hard Drive + 8GB cache, Nvidia GeForce GTX 960M 4GB DDR5. I'm using Ubuntu 16.04 LTS, with kernel 4.4.0-45 generic. As for CUDA, it's 8.0; cuDNN, 5.1; nvidia-smi
: 367.58
This bug is about Object of type 'path' has no field "realpath".
error and it can be fixed by upgrading bazel to 0.3.2. I think we can close this now.
@gokceneraslan I already did upgrade to bazel 0.3.2
After upgrading, did you try starting from a clean repository sandbox (i.e. delete your old directory, reclone and build again)?
and remove ~/.cache/bazel
.
i meet similar errors and update to bazel 0.3.2 does not fix the issue.
ERROR: com.google.devtools.build.lib.packages.BuildFileContainsErrorsException: error loading package '': Encountered error while reading extension file 'cuda/build_defs.bzl': no such package '@local_config_cuda//cuda': Traceback (most recent call last): File "/home/scopeserver/RaidDisk/DeepLearning/mwang/tensorflow/third_party/gpus/cuda_configure.bzl", line 517 _create_cuda_repository(repository_ctx) File "/home/scopeserver/RaidDisk/DeepLearning/mwang/tensorflow/third_party/gpus/cuda_configure.bzl", line 432, in _create_cuda_repository _cuda_toolkit_path(repository_ctx, cuda_version) File "/home/scopeserver/RaidDisk/DeepLearning/mwang/tensorflow/third_party/gpus/cuda_configure.bzl", line 148, in _cuda_toolkit_path str(repository_ctx.path(cuda_toolkit...) File "/home/scopeserver/RaidDisk/DeepLearning/mwang/tensorflow/third_party/gpus/cuda_configure.bzl", line 148, in str repository_ctx.path(cuda_toolkit_path).realpath Object of type 'path' has no field "realpath".
@aselle @gokceneraslan Okay, thanks, guys. I shall try your suggestions. Will get back to you.
@aselle @gokceneraslan Still, the same problem. It crashes.
Early screenshot:
Photo of my screen when bazel build
froze my computer:
Can you try something like that http://stackoverflow.com/questions/34756370/is-there-a-way-to-limit-the-number-of-cpu-cores-bazel-uses ?
Sure, will do. Thanks
I did bazel build -c opt --config=cuda --local_resources 5120,2.0,1.0 //tensorflow/cc:tutorials_example_trainer
. Looks like it's still using all four cores of my computer. Though my computer is not yet crashing right now.
This is the screenshot of my computer after bazel build -c opt --config=cuda --local_resources 5120,2.0,1.0 //tensorflow/cc:tutorials_example_trainer
was done. I suppose, it's successful? I'lll try to build the pip package now.
@gokceneraslan @aselle This is the result of my bazel build -c opt --config=cuda --local_resources 5120,2.0,1.0 //tensorflow/tools/pip_package:build_pip_package
:
INFO: Found 1 target...
INFO: From Compiling tensorflow/core/kernels/matrix_solve_op.cc:
tensorflow/core/kernels/matrix_solve_op.cc:78:7: warning: multi-line comment [-Wcomment]
// Make sure to backport: https://bitbucket.org/eigen/eigen/commits/ \
^
tensorflow/core/kernels/matrix_solve_op.cc:98:5: warning: multi-line comment [-Wcomment]
// https://bitbucket.org/eigen/eigen/pull-requests/174/ \
^
INFO: From Compiling tensorflow/core/kernels/tile_ops_gpu.cu.cc:
Killed
ERROR: /home/darth/tensorflow/tensorflow/core/kernels/BUILD:422:1: output 'tensorflow/core/kernels/_objs/tile_ops_gpu/tensorflow/core/kernels/tile_ops_gpu.cu.pic.o' was not created.
ERROR: /home/darth/tensorflow/tensorflow/core/kernels/BUILD:422:1: not all outputs were created.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Okay, I re-run bazel build -c opt --config=cuda --local_resources 5120,2.0,1.0 //tensorflow/tools/pip_package:build_pip_package
, and it went fine. Though I was not able to see its output since I added a | less
. But, when I did bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
, then sudo pip install /tmp/tensorflow_pkg/tensorflow-0.10.0-py2-none-any.whl
. I got it installed!
@aselle @gokceneraslan This is fine now, right?
I ran python3 ~/tensorflow/tensorflow/models/image/mnist/convolutional.py
, and got the following results:
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:868] Found device 0 with properties:
name: GeForce GTX 960M
major: 5 minor: 0 memoryClockRate (GHz) 1.176
pciBusID 0000:02:00.0
Total memory: 3.95GiB
Free memory: 3.63GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:889] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:899] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 960M, pci bus id: 0000:02:00.0)
Initialized!
Step 0 (epoch 0.00), 110.4 ms
Minibatch loss: 8.334, learning rate: 0.010000
Minibatch error: 85.9%
Validation error: 84.6%
Step 100 (epoch 0.12), 16.2 ms
Minibatch loss: 3.264, learning rate: 0.010000
Minibatch error: 7.8%
Validation error: 7.3%
Step 200 (epoch 0.23), 16.3 ms
Minibatch loss: 3.376, learning rate: 0.010000
Minibatch error: 10.9%
Validation error: 4.5%
Step 300 (epoch 0.35), 16.5 ms
Minibatch loss: 3.172, learning rate: 0.010000
Minibatch error: 3.1%
Validation error: 3.1%
Step 400 (epoch 0.47), 16.3 ms
Minibatch loss: 3.218, learning rate: 0.010000
Minibatch error: 6.2%
Validation error: 2.9%
Step 500 (epoch 0.58), 16.3 ms
Minibatch loss: 3.167, learning rate: 0.010000
Minibatch error: 4.7%
Validation error: 2.5%
Step 600 (epoch 0.70), 16.6 ms
Minibatch loss: 3.092, learning rate: 0.010000
Minibatch error: 3.1%
Validation error: 2.1%
Step 700 (epoch 0.81), 16.9 ms
Minibatch loss: 2.961, learning rate: 0.010000
Minibatch error: 1.6%
Validation error: 2.2%
Step 800 (epoch 0.93), 16.4 ms
Minibatch loss: 3.088, learning rate: 0.010000
Minibatch error: 6.2%
Validation error: 1.8%
Step 900 (epoch 1.05), 16.5 ms
Minibatch loss: 2.905, learning rate: 0.009500
Minibatch error: 1.6%
Validation error: 1.5%
Step 1000 (epoch 1.16), 16.7 ms
Minibatch loss: 2.882, learning rate: 0.009500
Minibatch error: 1.6%
Validation error: 1.7%
Step 1100 (epoch 1.28), 16.6 ms
Minibatch loss: 2.822, learning rate: 0.009500
Minibatch error: 0.0%
Validation error: 1.4%
Step 1200 (epoch 1.40), 16.9 ms
Minibatch loss: 2.964, learning rate: 0.009500
Minibatch error: 3.1%
Validation error: 1.5%
Step 1300 (epoch 1.51), 16.5 ms
Minibatch loss: 2.795, learning rate: 0.009500
Minibatch error: 1.6%
Validation error: 1.6%
Step 1400 (epoch 1.63), 16.1 ms
Minibatch loss: 2.810, learning rate: 0.009500
Minibatch error: 1.6%
Validation error: 1.5%
Step 1500 (epoch 1.75), 16.1 ms
Minibatch loss: 2.887, learning rate: 0.009500
Minibatch error: 3.1%
Validation error: 1.3%
Step 1600 (epoch 1.86), 16.2 ms
Minibatch loss: 2.714, learning rate: 0.009500
Minibatch error: 1.6%
Validation error: 1.3%
Step 1700 (epoch 1.98), 16.5 ms
Minibatch loss: 2.661, learning rate: 0.009500
Minibatch error: 0.0%
Validation error: 1.7%
Step 1800 (epoch 2.09), 16.4 ms
Minibatch loss: 2.669, learning rate: 0.009025
Minibatch error: 1.6%
Validation error: 1.3%
Step 1900 (epoch 2.21), 15.8 ms
Minibatch loss: 2.621, learning rate: 0.009025
Minibatch error: 0.0%
Validation error: 1.2%
Step 2000 (epoch 2.33), 16.3 ms
Minibatch loss: 2.603, learning rate: 0.009025
Minibatch error: 1.6%
Validation error: 1.2%
Step 2100 (epoch 2.44), 16.0 ms
Minibatch loss: 2.573, learning rate: 0.009025
Minibatch error: 0.0%
Validation error: 1.2%
Step 2200 (epoch 2.56), 16.3 ms
Minibatch loss: 2.564, learning rate: 0.009025
Minibatch error: 0.0%
Validation error: 1.1%
Step 2300 (epoch 2.68), 16.4 ms
Minibatch loss: 2.565, learning rate: 0.009025
Minibatch error: 1.6%
Validation error: 1.1%
Step 2400 (epoch 2.79), 16.1 ms
Minibatch loss: 2.501, learning rate: 0.009025
Minibatch error: 0.0%
Validation error: 1.1%
Step 2500 (epoch 2.91), 16.1 ms
Minibatch loss: 2.472, learning rate: 0.009025
Minibatch error: 0.0%
Validation error: 1.1%
Step 2600 (epoch 3.03), 16.1 ms
Minibatch loss: 2.463, learning rate: 0.008574
Minibatch error: 0.0%
Validation error: 1.3%
Step 2700 (epoch 3.14), 16.1 ms
Minibatch loss: 2.512, learning rate: 0.008574
Minibatch error: 1.6%
Validation error: 1.1%
Step 2800 (epoch 3.26), 16.1 ms
Minibatch loss: 2.458, learning rate: 0.008574
Minibatch error: 3.1%
Validation error: 1.2%
Step 2900 (epoch 3.37), 16.2 ms
Minibatch loss: 2.489, learning rate: 0.008574
Minibatch error: 1.6%
Validation error: 1.1%
Step 3000 (epoch 3.49), 16.4 ms
Minibatch loss: 2.405, learning rate: 0.008574
Minibatch error: 3.1%
Validation error: 0.9%
Step 3100 (epoch 3.61), 16.1 ms
Minibatch loss: 2.407, learning rate: 0.008574
Minibatch error: 3.1%
Validation error: 1.0%
Step 3200 (epoch 3.72), 16.0 ms
Minibatch loss: 2.345, learning rate: 0.008574
Minibatch error: 1.6%
Validation error: 1.1%
Step 3300 (epoch 3.84), 16.1 ms
Minibatch loss: 2.326, learning rate: 0.008574
Minibatch error: 0.0%
Validation error: 1.0%
Step 3400 (epoch 3.96), 16.6 ms
Minibatch loss: 2.300, learning rate: 0.008574
Minibatch error: 1.6%
Validation error: 1.2%
Step 3500 (epoch 4.07), 16.3 ms
Minibatch loss: 2.278, learning rate: 0.008145
Minibatch error: 0.0%
Validation error: 1.0%
Step 3600 (epoch 4.19), 16.5 ms
Minibatch loss: 2.250, learning rate: 0.008145
Minibatch error: 0.0%
Validation error: 0.9%
Step 3700 (epoch 4.31), 16.2 ms
Minibatch loss: 2.229, learning rate: 0.008145
Minibatch error: 0.0%
Validation error: 1.0%
Step 3800 (epoch 4.42), 16.1 ms
Minibatch loss: 2.218, learning rate: 0.008145
Minibatch error: 0.0%
Validation error: 0.9%
Step 3900 (epoch 4.54), 16.0 ms
Minibatch loss: 2.255, learning rate: 0.008145
Minibatch error: 3.1%
Validation error: 1.0%
Step 4000 (epoch 4.65), 16.3 ms
Minibatch loss: 2.243, learning rate: 0.008145
Minibatch error: 3.1%
Validation error: 1.0%
Step 4100 (epoch 4.77), 16.2 ms
Minibatch loss: 2.165, learning rate: 0.008145
Minibatch error: 0.0%
Validation error: 0.9%
Step 4200 (epoch 4.89), 16.4 ms
Minibatch loss: 2.160, learning rate: 0.008145
Minibatch error: 1.6%
Validation error: 1.0%
Step 4300 (epoch 5.00), 16.1 ms
Minibatch loss: 2.188, learning rate: 0.007738
Minibatch error: 1.6%
Validation error: 1.0%
Step 4400 (epoch 5.12), 16.2 ms
Minibatch loss: 2.120, learning rate: 0.007738
Minibatch error: 0.0%
Validation error: 1.0%
Step 4500 (epoch 5.24), 16.1 ms
Minibatch loss: 2.202, learning rate: 0.007738
Minibatch error: 4.7%
Validation error: 1.0%
Step 4600 (epoch 5.35), 16.3 ms
Minibatch loss: 2.088, learning rate: 0.007738
Minibatch error: 0.0%
Validation error: 0.9%
Step 4700 (epoch 5.47), 16.0 ms
Minibatch loss: 2.086, learning rate: 0.007738
Minibatch error: 1.6%
Validation error: 0.9%
Step 4800 (epoch 5.59), 16.7 ms
Minibatch loss: 2.062, learning rate: 0.007738
Minibatch error: 1.6%
Validation error: 1.0%
Step 4900 (epoch 5.70), 16.1 ms
Minibatch loss: 2.061, learning rate: 0.007738
Minibatch error: 1.6%
Validation error: 1.0%
Step 5000 (epoch 5.82), 16.7 ms
Minibatch loss: 2.084, learning rate: 0.007738
Minibatch error: 3.1%
Validation error: 0.8%
Step 5100 (epoch 5.93), 17.7 ms
Minibatch loss: 2.006, learning rate: 0.007738
Minibatch error: 1.6%
Validation error: 1.1%
Step 5200 (epoch 6.05), 16.3 ms
Minibatch loss: 2.072, learning rate: 0.007351
Minibatch error: 3.1%
Validation error: 0.8%
Step 5300 (epoch 6.17), 16.4 ms
Minibatch loss: 1.970, learning rate: 0.007351
Minibatch error: 0.0%
Validation error: 0.8%
Step 5400 (epoch 6.28), 16.6 ms
Minibatch loss: 1.957, learning rate: 0.007351
Minibatch error: 0.0%
Validation error: 0.9%
Step 5500 (epoch 6.40), 16.3 ms
Minibatch loss: 1.997, learning rate: 0.007351
Minibatch error: 1.6%
Validation error: 1.0%
Step 5600 (epoch 6.52), 16.3 ms
Minibatch loss: 1.943, learning rate: 0.007351
Minibatch error: 1.6%
Validation error: 0.8%
Step 5700 (epoch 6.63), 16.2 ms
Minibatch loss: 1.915, learning rate: 0.007351
Minibatch error: 0.0%
Validation error: 0.8%
Step 5800 (epoch 6.75), 16.2 ms
Minibatch loss: 1.898, learning rate: 0.007351
Minibatch error: 0.0%
Validation error: 0.8%
Step 5900 (epoch 6.87), 16.2 ms
Minibatch loss: 1.890, learning rate: 0.007351
Minibatch error: 0.0%
Validation error: 0.9%
Step 6000 (epoch 6.98), 16.2 ms
Minibatch loss: 1.912, learning rate: 0.007351
Minibatch error: 1.6%
Validation error: 0.9%
Step 6100 (epoch 7.10), 16.0 ms
Minibatch loss: 1.864, learning rate: 0.006983
Minibatch error: 0.0%
Validation error: 0.9%
Step 6200 (epoch 7.21), 16.1 ms
Minibatch loss: 1.843, learning rate: 0.006983
Minibatch error: 0.0%
Validation error: 0.8%
Step 6300 (epoch 7.33), 16.1 ms
Minibatch loss: 1.855, learning rate: 0.006983
Minibatch error: 1.6%
Validation error: 0.9%
Step 6400 (epoch 7.45), 16.1 ms
Minibatch loss: 1.836, learning rate: 0.006983
Minibatch error: 1.6%
Validation error: 0.8%
Step 6500 (epoch 7.56), 16.9 ms
Minibatch loss: 1.806, learning rate: 0.006983
Minibatch error: 0.0%
Validation error: 0.9%
Step 6600 (epoch 7.68), 16.6 ms
Minibatch loss: 1.825, learning rate: 0.006983
Minibatch error: 1.6%
Validation error: 0.8%
Step 6700 (epoch 7.80), 16.5 ms
Minibatch loss: 1.783, learning rate: 0.006983
Minibatch error: 0.0%
Validation error: 0.8%
Step 6800 (epoch 7.91), 16.4 ms
Minibatch loss: 1.773, learning rate: 0.006983
Minibatch error: 0.0%
Validation error: 0.9%
Step 6900 (epoch 8.03), 16.5 ms
Minibatch loss: 1.759, learning rate: 0.006634
Minibatch error: 0.0%
Validation error: 0.9%
Step 7000 (epoch 8.15), 16.5 ms
Minibatch loss: 1.757, learning rate: 0.006634
Minibatch error: 0.0%
Validation error: 0.9%
Step 7100 (epoch 8.26), 15.9 ms
Minibatch loss: 1.734, learning rate: 0.006634
Minibatch error: 0.0%
Validation error: 0.8%
Step 7200 (epoch 8.38), 16.1 ms
Minibatch loss: 1.728, learning rate: 0.006634
Minibatch error: 0.0%
Validation error: 0.9%
Step 7300 (epoch 8.49), 16.1 ms
Minibatch loss: 1.779, learning rate: 0.006634
Minibatch error: 3.1%
Validation error: 0.8%
Step 7400 (epoch 8.61), 16.4 ms
Minibatch loss: 1.699, learning rate: 0.006634
Minibatch error: 0.0%
Validation error: 0.7%
Step 7500 (epoch 8.73), 16.4 ms
Minibatch loss: 1.690, learning rate: 0.006634
Minibatch error: 0.0%
Validation error: 0.7%
Step 7600 (epoch 8.84), 16.4 ms
Minibatch loss: 1.775, learning rate: 0.006634
Minibatch error: 1.6%
Validation error: 0.9%
Step 7700 (epoch 8.96), 16.3 ms
Minibatch loss: 1.666, learning rate: 0.006634
Minibatch error: 0.0%
Validation error: 0.9%
Step 7800 (epoch 9.08), 16.3 ms
Minibatch loss: 1.665, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 0.8%
Step 7900 (epoch 9.19), 15.8 ms
Minibatch loss: 1.647, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 0.8%
Step 8000 (epoch 9.31), 17.0 ms
Minibatch loss: 1.648, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 0.8%
Step 8100 (epoch 9.43), 16.9 ms
Minibatch loss: 1.634, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 0.9%
Step 8200 (epoch 9.54), 16.5 ms
Minibatch loss: 1.619, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 0.8%
Step 8300 (epoch 9.66), 16.4 ms
Minibatch loss: 1.614, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 0.7%
Step 8400 (epoch 9.77), 16.5 ms
Minibatch loss: 1.597, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 0.8%
Step 8500 (epoch 9.89), 16.4 ms
Minibatch loss: 1.617, learning rate: 0.006302
Minibatch error: 1.6%
Validation error: 0.9%
Test error: 0.7%
OK, let's close the bug then.
Thanks a lot, @gokceneraslan @aselle !
I also got tile_ops_gpu.cu.pic.o' was not created.
What have you done to fix it ??
Did you try re-running bazel build
, @marcosmoura91 ? That's what I did.
@AFAgarap I actually quitted installing from sources... I don't know why most of tutorials advise that. So I just installed using pip install and it went great and quick. So now it is working 100%.
I upgraded to 0.11rc1 as you have advised, @tatatodd in issue #4841 . I did a
sudo ./configure
, and this was the result:Thanks in advance for your response.