Open yulan0215 opened 3 years ago
May be insufficient memory space (https://github.com/tensorflow/models/issues/3647 )...
You can decrease number of concurrent process (for example bazel build -j 8
).
Could you provide more information regarding your build ? System env, bazel command, etc.
Also you should read this log file where the error is probably well explained.
May be insufficient memory space... Could you provide more information regarding your build ? System env, bazel command, etc. Also you should read this log file where the error is probably well explained.
Thank you for your quick reply...
The installation via bazel: bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.1 --copt=-msse4.2 //tensorflow:libtensorflow_framework.so //tensorflow:libtensorflow_cc.so //tensorflow:libtensorflow.so //tensorflow/tools/pip_package:build_pip_package --noincompatible_do_not_split_linking_cmdline
And the condition of installation: Starting local Bazel server and connecting to it... INFO: Options provided by the client: Inherited 'common' options: --isatty=1 --terminal_columns=80 INFO: Reading rc options for 'build' from /work/tf/tensorflow/.bazelrc: Inherited 'common' options: --experimental_repo_remote_exec INFO: Reading rc options for 'build' from /work/tf/tensorflow/.bazelrc: 'build' options: --define framework_shared_object=true --java_toolchain=@tf_toolchains//toolchains/java:tf_java_toolchain --host_java_toolchain=@tf_toolchains//toolchains/java:tf_java_toolchain --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --noincompatible_prohibit_aapt1 --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true INFO: Reading rc options for 'build' from /work/tf/tensorflow/.tf_configure.bazelrc: 'build' options: --action_env PYTHON_BIN_PATH=/usr/bin/python3 --action_env PYTHON_LIB_PATH=/opt/ros/noetic/lib/python3/dist-packages --python_path=/usr/bin/python3 --action_env PYTHONPATH=/opt/ros/noetic/lib/python3/dist-packages INFO: Found applicable config definition build:short_logs in file /work/tf/tensorflow/.bazelrc: --output_filter=DONT_MATCH_ANYTHING INFO: Found applicable config definition build:v2 in file /work/tf/tensorflow/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1 INFO: Found applicable config definition build:linux in file /work/tf/tensorflow/.bazelrc: --copt=-w --host_copt=-w --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++14 --host_cxxopt=-std=c++14 --config=dynamic_kernels INFO: Found applicable config definition build:dynamic_kernels in file /work/tf/tensorflow/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS DEBUG: /home/yla18/.cache/bazel/_bazel_yla18/d4ef3948567f7f7d65fbd5757d6da7fb/external/tf_runtime/third_party/cuda/dependencies.bzl:51:10: The following command will download NVIDIA proprietary software. By using the software you agree to comply with the terms of the license agreement that accompanies the software. If you do not agree to the terms of the license agreement, do not use the software. INFO: Analyzed 4 targets (432 packages loaded, 33155 targets configured). INFO: Found 4 targets... INFO: Deleting stale sandbox base /home/yla18/.cache/bazel/_bazel_yla18/d4ef3948567f7f7d65fbd5757d6da7fb/sandbox [12,254 / 16,921] 16 actions running Compiling tensorflow/core/kernels/bias_op.cc [for host]; 159s local Compiling tensorflow/core/kernels/conv_ops.cc [for host]; 146s local Compiling .../core/kernels/conv_grad_ops_3d.cc [for host]; 143s local Compiling .../core/kernels/conv_grad_filter_ops.cc [for host]; 135s local Compiling .../core/kernels/reduction_ops_mean.cc [for host]; 108s local Compiling .../core/kernels/reduction_ops_max.cc [for host]; 108s local Compiling .../kernels/reduction_ops_euclidean.cc [for host]; 106s local Compiling .../mkl/mkl_fused_batch_norm_op.cc [for host]; 95s local ...
Server terminated abruptly (error code: 14, error message: 'Socket closed', log file: '/home/yla18/.cache/bazel/_bazel_yla18/d4ef3948567f7f7d65fbd5757d6da7fb/server/jvm.out')
May be insufficient memory space (tensorflow/models#3647 )... You can decrease number of concurrent process (for example
bazel build -j 8
). Could you provide more information regarding your build ? System env, bazel command, etc.Also you should read this log file where the error is probably well explained.
I installed it in the virtual system but it has its own cpu, so I guess this should be no problem...
Looking at the bazel output, you're building with 8 cores. So you probably need more RAM, or less cores. But to be sure, you should read this file "'/home/yla18/.cache/bazel/_bazel_yla18/d4ef3948567f7f7d65fbd5757d6da7fb/server/jvm.out". Also you could watch your memory usage just after typing the build command.
Which TF version are you building ?
--noincompatible_do_not_split_linking_cmdline
is not required for latest releases and it will provoke an error later in the build process.
Looking at the bazel output, you're building with 8 cores. So you probably need more RAM, or less cores. But to be sure, you should read this file "'/home/yla18/.cache/bazel/_bazel_yla18/d4ef3948567f7f7d65fbd5757d6da7fb/server/jvm.out".
Also which TF version are you building ?
--noincompatible_do_not_split_linking_cmdline
is not required for latest releases and it will provoke an error later in the build process. I installed tensorflow in this weblink: https://github.com/tensorflow/tensorflow The file in this path: "/home/yla18/.cache/bazel/_bazel_yla18/d4ef3948567f7f7d65fbd5757d6da7fb/server/jvm.out" is the blank file. How can I modify it regarding the: --noincompatible_do_not_split_linking_cmdline
You should avoid building from the TF master branch. Just checkout a release git checkout v2.4.2
for CUDA==11.0 or git checkout v2.5.0
for CUDA==11.2.
Then just omit the --noincompatible_do_not_split_linking_cmdline
option.
Also you can add the --verbose_failures
option which is really useful !
You should avoid building from the TF master branch. Just checkout a release
git checkout v2.4.2
for CUDA==11.0 orgit checkout v2.5.0
for CUDA==11.2. Then just omit the--noincompatible_do_not_split_linking_cmdline
option.Also you can add the
--verbose_failures
option which is really useful !
Thank you for your support and I will check it...
You should avoid building from the TF master branch. Just checkout a release
git checkout v2.4.2
for CUDA==11.0 orgit checkout v2.5.0
for CUDA==11.2. Then just omit the--noincompatible_do_not_split_linking_cmdline
option.Also you can add the
--verbose_failures
option which is really useful !
I am sorry because I used the virtual machine, so the question is that I may not have CUDA. Is it possible for me to install? The command I used is:bazel build -j 8 -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.1 --copt=-msse4.2 //tensorflow:libtensorflow_framework.so //tensorflow:libtensorflow_cc.so //tensorflow:libtensorflow.so //tensorflow/tools/pip_package:build_pip_package --verbose_failures
The error I encountered just now is: ERROR: /work/tf/tensorflow-2.4.2/tensorflow/tools/pip_package/BUILD:286:10 Executing genrule //tensorflow/python/keras/api:keras_python_api_gen failed (Exit 1): bash failed: error executing command (cd /home/yla18/.cache/bazel/_bazel_yla18/abcc9dfd95bf7770f74cf5e488b03cd9/execroot/org_tensorflow && \ exec env - \ LD_LIBRARY_PATH=/opt/ros/noetic/lib \ PATH=/opt/ros/noetic/bin:/usr/local/GMTSAR/bin:/usr/local/GMTSAR/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin \ PYTHON_BIN_PATH=/usr/bin/python3 \ PYTHON_LIB_PATH=/usr/local/lib/python3.8/dist-packages \ TF2_BEHAVIOR=1 \ TF_CONFIGURE_IOS=0 \ /bin/bash -c 'source external/bazel_tools/tools/genrule/genrule-setup.sh; bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen --apidir=bazel-out/k8-opt/bin/tensorflow/python/keras/api --apiname=keras --apiversion=1 --loading=default --package=tensorflow.python,tensorflow.python.keras,tensorflow.python.keras.activations,tensorflow.python.keras.applications.densenet,tensorflow.python.keras.applications.efficientnet,tensorflow.python.keras.applications.imagenet_utils,tensorflow.python.keras.applications.inception_resnet_v2,tensorflow.python.keras.applications.inception_v3,tensorflow.python.keras.applications.mobilenet,tensorflow.python.keras.applications.mobilenet_v2,tensorflow.python.keras.applications.mobilenet_v3,tensorflow.python.keras.applications.nasnet,tensorflow.python.keras.applications.resnet,tensorflow.python.keras.applications.resnet_v2,tensorflow.python.keras.applications.vgg16,tensorflow.python.keras.applications.vgg19,tensorflow.python.keras.applications.xception,tensorflow.python.keras.backend,tensorflow.python.keras.backend_config,tensorflow.python.keras.callbacks,tensorflow.python.keras.callbacks_v1,tensorflow.python.keras.constraints,tensorflow.python.keras.datasets.boston_housing,tensorflow.python.keras.datasets.cifar10,tensorflow.python.keras.datasets.cifar100,tensorflow.python.keras.datasets.fashion_mnist,tensorflow.python.keras.datasets.imdb,tensorflow.python.keras.datasets.mnist,tensorflow.python.keras.datasets.reuters,tensorflow.python.keras.engine.base_layer,tensorflow.python.keras.engine.data_adapter,tensorflow.python.keras.engine.input_layer,tensorflow.python.keras.engine.input_spec,tensorflow.python.keras.engine.sequential,tensorflow.python.keras.engine.training,tensorflow.python.keras.estimator,tensorflow.python.keras.feature_column.sequence_feature_column,tensorflow.python.keras.initializers,tensorflow.python.keras.initializers.initializers_v1,tensorflow.python.keras.initializers.initializers_v2,tensorflow.python.keras.layers.advanced_activations,tensorflow.python.keras.layers.convolutional,tensorflow.python.keras.layers.convolutional_recurrent,tensorflow.python.keras.layers.core,tensorflow.python.keras.layers.cudnn_recurrent,tensorflow.python.keras.layers.dense_attention,tensorflow.python.keras.layers.embeddings,tensorflow.python.keras.layers.local,tensorflow.python.keras.layers.merge,tensorflow.python.keras.layers.noise,tensorflow.python.keras.layers.normalization,tensorflow.python.keras.layers.normalization_v2,tensorflow.python.keras.layers.preprocessing,tensorflow.python.keras.layers.pooling,tensorflow.python.keras.layers.recurrent,tensorflow.python.keras.layers.recurrent_v2,tensorflow.python.keras.layers.serialization,tensorflow.python.keras.layers.wrappers,tensorflow.python.keras.losses,tensorflow.python.keras.metrics,tensorflow.python.keras.mixed_precision.get_layer_policy,tensorflow.python.keras.mixed_precision.loss_scale_optimizer,tensorflow.python.keras.mixed_precision.policy,tensorflow.python.keras.models,tensorflow.python.keras.optimizer_v2.adadelta,tensorflow.python.keras.optimizer_v2.adagrad,tensorflow.python.keras.optimizer_v2.adam,tensorflow.python.keras.optimizer_v2.adamax,tensorflow.python.keras.optimizer_v2.ftrl,tensorflow.python.keras.optimizer_v2.gradient_descent,tensorflow.python.keras.optimizer_v2.learning_rate_schedule,tensorflow.python.keras.optimizer_v2.nadam,tensorflow.python.keras.optimizer_v2.optimizer_v2,tensorflow.python.keras.optimizer_v2.rmsprop,tensorflow.python.keras.optimizers,tensorflow.python.keras.premade.linear,tensorflow.python.keras.premade.wide_deep,tensorflow.python.keras.preprocessing.image,tensorflow.python.keras.preprocessing.sequence,tensorflow.python.keras.preprocessing.text,tensorflow.python.keras.regularizers,tensorflow.python.keras.saving.model_config,tensorflow.python.keras.saving.save,tensorflow.python.keras.saving.saved_model_experimental,tensorflow.python.keras.utils.data_utils,tensorflow.python.keras.utils.generic_utils,tensorflow.python.keras.utils.io_utils,tensorflow.python.keras.utils.layer_utils,tensorflow.python.keras.utils.losses_utils,tensorflow.python.keras.utils.multi_gpu_utils,tensorflow.python.keras.utils.np_utils,tensorflow.python.keras.utils.vis_utils,tensorflow.python.keras.wrappers.scikit_learn --output_package=tensorflow.python.keras.api --use_relative_imports=True bazel-out/k8-opt/bin/tensorflow/python/keras/api/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/activations/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/applications/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/applications/densenet/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/applications/efficientnet/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/applications/imagenet_utils/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/applications/inception_resnet_v2/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/applications/inception_v3/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/applications/mobilenet/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/applications/mobilenet_v2/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/applications/mobilenet_v3/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/applications/nasnet/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/applications/resnet/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/applications/resnet_v2/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/applications/resnet50/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/applications/vgg16/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/applications/vgg19/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/applications/xception/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/backend/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/callbacks/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/callbacks/experimental/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/constraints/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/datasets/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/datasets/boston_housing/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/datasets/cifar10/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/datasets/cifar100/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/datasets/fashion_mnist/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/datasets/imdb/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/datasets/mnist/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/datasets/reuters/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/estimator/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/experimental/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/initializers/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/layers/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/layers/experimental/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/layers/experimental/preprocessing/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/losses/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/metrics/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/mixed_precision/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/mixed_precision/experimental/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/models/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/optimizers/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/optimizers/schedules/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/premade/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/preprocessing/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/preprocessing/image/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/preprocessing/sequence/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/preprocessing/text/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/regularizers/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/utils/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/wrappers/init.py bazel-out/k8-opt/bin/tensorflow/python/keras/api/keras/wrappers/scikit_learn/init.py') Execution platform: @local_execution_config_platform//:platform
Thank you for your help and I am looking forward to your reply!
Yes you can use TF without GPU / CUDA.
Bazel is kind of messy, it could be some dependency version mismatch or something else related to your env / VM. Make sure to check your python deps, including numpy<1.20, and check your environment variable, you can look at this file for some reference : https://github.com/remicres/otbtf/blob/develop/tools/docker/build-env-tf.sh
I'm not sure, but it is most likely an Out Of Memory error and this would explain you can't see any log file.
If you don't have a lot of RAM (say less than 10GB) you should try with -j 4
or something like that, but your build is going to take more than a day, depending on your CPU !
Thus you'd better pull a Docker image if possible...
Which bazel version are you running ?
If you're following this file https://github.com/remicres/otbtf/blob/develop/doc/HOWTOBUILD.md you should git checkout v2.1.4
because TF installation steps have changed a bit.
Make sure you're running the right bazel version for your TF git branch (there's a .bazelversion
file at the root of your tensorflow repository), or use bazelisk.
If you really want to build the latest tensorflow release, check the Dockerfile for a more up-to-date installation script example (Ubuntu 18.04 or 20.04).
Yes you can use TF without GPU / CUDA.
Bazel is kind of messy, it could be some dependency version mismatch or something else related to your env / VM. Make sure to check your python deps, including numpy<1.20, and check your environment variable, you can look at this file for some reference : https://github.com/remicres/otbtf/blob/develop/tools/docker/build-env-tf.sh
I'm not sure, but it is most likely an Out Of Memory error and this would explain you can't see any log file. If you don't have a lot of RAM (say less than 10GB) you should try with
-j 4
or something like that, but your build is going to take more than a day, depending on your CPU !Thus you'd better pull a Docker image if possible...
Which bazel version are you running ? If you're following this file https://github.com/remicres/otbtf/blob/develop/doc/HOWTOBUILD.md you should
git checkout v2.1.4
because TF installation steps have changed a bit. Make sure you're running the right bazel version for your TF git branch, or use bazelisk.If you really want to build the latest tensorflow release, check the Dockerfile for a more up-to-date installation script example.
Thank you for your quick reply. I used the bazel 3.7.2.
Thank you for your quick reply. I used the bazel 3.7.2.
In that case you should checkout v2.5.0. For TF 2.4.2 the required bazel version is 3.1.0.
Thank you for your quick reply. I used the bazel 3.7.2.
In that case you should checkout v2.5.0. For TF 2.4.2 the required bazel version is 3.1.0.
Thank you for your help! I am sorry I had another question that I install the docker but I do not know how to use docker to install tensorflow. I will appreciate if you can give me some ideas! Thanks and I am looking forward to your reply!
docker pull mdl4eo/otbtf2.4:cpu
docker run -it mdl4eo/otbtf2.4:cpu
Check this file : https://github.com/remicres/otbtf/blob/develop/doc/DOCKERUSE.md
docker pull mdl4eo/otbtf2.4:cpu docker run -it mdl4eo/otbtf2.4:cpu
Check this file : https://github.com/remicres/otbtf/blob/develop/doc/DOCKERUSE.md
Thank you for your quick reply! I followed the guide you give me but I encountered the problem like: /work/otb/build$ sudo docker run -u otbuser -v $(pwd):/home/otbuser mdl4eo/otbtf2.4:cpu otbcli_PatchesExtraction -help
This is the PatchesExtraction application, version 7.2.0
This application extracts patches in multiple input images. Change the OTB_TF_NSOURCES environment variable to set the number of sources. Tags: Learning
The application takes an input vector layer which is a set of points, typically the output of the "SampleSelection" or the "LabelImageSampleSelection" application to sample patches in the input images (samples are centered on the points). A "source" parameters group is composed of (i) an input image list (can be one image e.g. high res. image, or multiple e.g. time series), (ii) the size of the patches to sample, and (iii) the output images of patches which will be generated at the end of the process. The example below show how to set the samples sizes. For a SPOT6 image for instance, the patch size can be 64x64 and for an input Sentinel-2 time series the patch size could be 1x1. Note that if a dimension size is not defined, the largest one will be used (i.e. input image dimensions. The number of input sources can be changed at runtime by setting the system environment variable OTB_TF_NSOURCES
Parameters:
-source1
Use -help param1 [... paramN] to see detailed documentation of those parameters.
Examples: otbcli_PatchesExtraction -vec points.sqlite -source1.il $s2_list -source1.patchsizex 16 -source1.patchsizey 16 -field class -source1.out outpatches_16x16.tif -outlabels outlabels.tif
Authors: Remi Cresson
Limitations:
See also:
So everything is working OK.You need to learn how to use Docker, they provide a great documentation !
Run a bash shell with docker run -it mdl4eo/otbtf2.4:cpu /bin/bash
Hi: When I install the tensorflow with bazel, the problem is: Server terminated abruptly (error code: 14, error message: 'Socket closed', log file: '/home/yla18/.cache/bazel/_bazel_yla18/d4ef3948567f7f7d65fbd5757d6da7fb/server/jvm.out') Anyone has the solution for this issue? Thx!