Closed Rtakaha closed 2 years ago
Full error message when I try building lingvo/trainer.py
with tensorflow==2.6.0
.
root@dc6418f0d953:/tmp/lingvo# bazel build -c opt --config=cuda --copt=-D_GLIBCXX_USE_CXX11_ABI=0 //lingvo:trainer
DEBUG: Rule 'subpar' indicated that a canonical reproducible form can be obtained by modifying arguments commit = "35bb9f0092f71ea56b742a5
20602da9b3638a24f", shallow_since = "1557863961 -0400" and dropping ["tag"]
DEBUG: Repository subpar instantiated at:
/tmp/lingvo/WORKSPACE:12:15: in <toplevel>
Repository rule git_repository defined at:
/root/.cache/bazel/_bazel_root/17eb95f0bc03547f4f1319e61997e114/external/bazel_tools/tools/build_defs/repo/git.bzl:199:33: in <toplevel>
INFO: Analyzed target //lingvo:trainer (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
ERROR: /tmp/lingvo/lingvo/core/ops/BUILD:182:18: Compiling lingvo/core/ops/input_common.cc failed: (Exit 1): gcc failed: error executing c
ommand /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer
-g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 66 argument(s) skipped)
Use --sandbox_debug to see verbose messages from the sandbox gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-pr
otector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunctio
n-sections ... (remaining 66 argument(s) skipped)
Use --sandbox_debug to see verbose messages from the sandbox
In file included from lingvo/core/ops/input_common.cc:16:0:
./lingvo/core/ops/input_common.h:143:55: error: expected class-name before '{' token
class InputResource : public tensorflow::ResourceBase {
^
./lingvo/core/ops/input_common.h: In member function 'void tensorflow::lingvo::InputOpV2Create<RecordProcessorClass>::Compute(tensorflow::
OpKernelContext*)':
./lingvo/core/ops/input_common.h:228:25: error: 'MakeRefCountingHandle' is not a member of 'tensorflow::ResourceHandle'
ResourceHandle::MakeRefCountingHandle(resource, ctx->device()->name(),
^~~~~~~~~~~~~~~~~~~~~
./lingvo/core/ops/input_common.h: In member function 'void tensorflow::lingvo::InputOpV2GetNext<RecordProcessorClass>::Compute(tensorflow:
:OpKernelContext*)':
./lingvo/core/ops/input_common.h:252:28: error: 'const class tensorflow::ResourceHandle' has no member named 'GetResource'
auto statusor = handle.GetResource<resource_type>();
^~~~~~~~~~~
./lingvo/core/ops/input_common.h:252:53: error: expected primary-expression before '>' token
auto statusor = handle.GetResource<resource_type>();
^
./lingvo/core/ops/input_common.h:252:55: error: expected primary-expression before ')' token
auto statusor = handle.GetResource<resource_type>();
^
Target //lingvo:trainer failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 8.990s, Critical Path: 8.69s
INFO: 21 processes: 5 internal, 16 processwrapper-sandbox.
FAILED: Build did NOT complete successfully
Error message with --sandbox_debug
, --verbose_failures
.
root@dc6418f0d953:/tmp/lingvo# bazel build -c opt --config=cuda --copt=-D_GLIBCXX_USE_CXX11_ABI=0 //lingvo:trainer --sandbox_debug --verbose_failures
DEBUG: Rule 'subpar' indicated that a canonical reproducible form can be obtained by modifying arguments commit = "35bb9f0092f71ea56b742a5
20602da9b3638a24f", shallow_since = "1557863961 -0400" and dropping ["tag"]
DEBUG: Repository subpar instantiated at:
/tmp/lingvo/WORKSPACE:12:15: in <toplevel>
Repository rule git_repository defined at:
/root/.cache/bazel/_bazel_root/17eb95f0bc03547f4f1319e61997e114/external/bazel_tools/tools/build_defs/repo/git.bzl:199:33: in <toplevel>
INFO: Analyzed target //lingvo:trainer (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
ERROR: /tmp/lingvo/lingvo/core/ops/BUILD:359:22: Compiling lingvo/core/ops/generic_input_op_kernels.cc failed: (Exit 1): process-wrapper f
ailed: error executing command
(cd /root/.cache/bazel/_bazel_root/17eb95f0bc03547f4f1319e61997e114/sandbox/processwrapper-sandbox/196/execroot/__main__ && \
exec env - \
LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64 \
PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
PWD=/proc/self/cwd \
TMPDIR=/tmp \
/root/.cache/bazel/_bazel_root/install/1a4a2fac02d50c77031d44c0d91b8920/process-wrapper '--timeout=0' '--kill_delay=15' /usr/bin/gcc -U_
FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOU
RCE=1' -DNDEBUG -ffunction-sections -fdata-sections '-std=c++0x' -MD -MF bazel-out/k8-opt/bin/lingvo/core/ops/_objs/generic_input_op_kerne
ls/generic_input_op_kernels.pic.d '-frandom-seed=bazel-out/k8-opt/bin/lingvo/core/ops/_objs/generic_input_op_kernels/generic_input_op_kern
els.pic.o' -fPIC -iquote . -iquote bazel-out/k8-opt/bin -iquote external/tensorflow_includes -iquote bazel-out/k8-opt/bin/external/tensorf
low_includes -iquote external/absl_includes -iquote bazel-out/k8-opt/bin/external/absl_includes -iquote external/eigen_archive -iquote baz
el-out/k8-opt/bin/external/eigen_archive -iquote external/protobuf_archive -iquote bazel-out/k8-opt/bin/external/protobuf_archive -iquote
external/zlib_includes -iquote bazel-out/k8-opt/bin/external/zlib_includes -iquote external/tensorflow_solib -iquote bazel-out/k8-opt/bin/
external/tensorflow_solib -isystem external/tensorflow_includes/tensorflow_includes -isystem bazel-out/k8-opt/bin/external/tensorflow_incl
udes/tensorflow_includes -isystem external/absl_includes/absl -isystem bazel-out/k8-opt/bin/external/absl_includes/absl -isystem external/
eigen_archive/tf_includes -isystem bazel-out/k8-opt/bin/external/eigen_archive/tf_includes -isystem external/protobuf_archive/tf_includes
-isystem bazel-out/k8-opt/bin/external/protobuf_archive/tf_includes -isystem external/zlib_includes/zlib -isystem bazel-out/k8-opt/bin/ext
ernal/zlib_includes/zlib '-D_GLIBCXX_USE_CXX11_ABI=0' '-D_GLIBCXX_USE_CXX11_ABI=0' '-std=c++14' -Wno-sign-compare -mavx '-DGOOGLE_CUDA=1'
-fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c
lingvo/core/ops/generic_input_op_kernels.cc -o bazel-out/k8-opt/bin/lingvo/core/ops/_objs/generic_input_op_kernels/generic_input_op_kerne
ls.pic.o) process-wrapper failed: error executing command
(cd /root/.cache/bazel/_bazel_root/17eb95f0bc03547f4f1319e61997e114/sandbox/processwrapper-sandbox/196/execroot/__main__ && \
exec env - \
LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64 \
PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
PWD=/proc/self/cwd \
TMPDIR=/tmp \
/root/.cache/bazel/_bazel_root/install/1a4a2fac02d50c77031d44c0d91b8920/process-wrapper '--timeout=0' '--kill_delay=15' /usr/bin/gcc -U_
FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOU
RCE=1' -DNDEBUG -ffunction-sections -fdata-sections '-std=c++0x' -MD -MF bazel-out/k8-opt/bin/lingvo/core/ops/_objs/generic_input_op_kerne
ls/generic_input_op_kernels.pic.d '-frandom-seed=bazel-out/k8-opt/bin/lingvo/core/ops/_objs/generic_input_op_kernels/generic_input_op_kern
els.pic.o' -fPIC -iquote . -iquote bazel-out/k8-opt/bin -iquote external/tensorflow_includes -iquote bazel-out/k8-opt/bin/external/tensorf
low_includes -iquote external/absl_includes -iquote bazel-out/k8-opt/bin/external/absl_includes -iquote external/eigen_archive -iquote baz
el-out/k8-opt/bin/external/eigen_archive -iquote external/protobuf_archive -iquote bazel-out/k8-opt/bin/external/protobuf_archive -iquote
external/zlib_includes -iquote bazel-out/k8-opt/bin/external/zlib_includes -iquote external/tensorflow_solib -iquote bazel-out/k8-opt/bin/
external/tensorflow_solib -isystem external/tensorflow_includes/tensorflow_includes -isystem bazel-out/k8-opt/bin/external/tensorflow_incl
udes/tensorflow_includes -isystem external/absl_includes/absl -isystem bazel-out/k8-opt/bin/external/absl_includes/absl -isystem external/
eigen_archive/tf_includes -isystem bazel-out/k8-opt/bin/external/eigen_archive/tf_includes -isystem external/protobuf_archive/tf_includes
-isystem bazel-out/k8-opt/bin/external/protobuf_archive/tf_includes -isystem external/zlib_includes/zlib -isystem bazel-out/k8-opt/bin/ext
ernal/zlib_includes/zlib '-D_GLIBCXX_USE_CXX11_ABI=0' '-D_GLIBCXX_USE_CXX11_ABI=0' '-std=c++14' -Wno-sign-compare -mavx '-DGOOGLE_CUDA=1'
-fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c
lingvo/core/ops/generic_input_op_kernels.cc -o bazel-out/k8-opt/bin/lingvo/core/ops/_objs/generic_input_op_kernels/generic_input_op_kerne
ls.pic.o)
In file included from lingvo/core/ops/generic_input_op_kernels.cc:20:0:
./lingvo/core/ops/input_common.h:143:55: error: expected class-name before '{' token
class InputResource : public tensorflow::ResourceBase {
^
./lingvo/core/ops/input_common.h: In member function 'void tensorflow::lingvo::InputOpV2Create<RecordProcessorClass>::Compute(tensorflow::
OpKernelContext*)':
./lingvo/core/ops/input_common.h:228:25: error: 'MakeRefCountingHandle' is not a member of 'tensorflow::ResourceHandle'
ResourceHandle::MakeRefCountingHandle(resource, ctx->device()->name(),
^~~~~~~~~~~~~~~~~~~~~
./lingvo/core/ops/input_common.h: In member function 'void tensorflow::lingvo::InputOpV2GetNext<RecordProcessorClass>::Compute(tensorflow:
:OpKernelContext*)':
./lingvo/core/ops/input_common.h:252:28: error: 'const class tensorflow::ResourceHandle' has no member named 'GetResource'
auto statusor = handle.GetResource<resource_type>();
^~~~~~~~~~~
./lingvo/core/ops/input_common.h:252:53: error: expected primary-expression before '>' token
auto statusor = handle.GetResource<resource_type>();
^
./lingvo/core/ops/input_common.h:252:55: error: expected primary-expression before ')' token
auto statusor = handle.GetResource<resource_type>();
^
./lingvo/core/ops/input_common.h: In instantiation of 'class tensorflow::lingvo::InputResource<tensorflow::lingvo::{anonymous}::GenericInp
utProcessor>':
./lingvo/core/ops/input_common.h:259:15: required from 'void tensorflow::lingvo::InputOpV2GetNext<RecordProcessorClass>::Compute(tensorf
low::OpKernelContext*) [with RecordProcessorClass = tensorflow::lingvo::{anonymous}::GenericInputProcessor]'
lingvo/core/ops/generic_input_op_kernels.cc:369:1: required from here
./lingvo/core/ops/input_common.h:165:15: error: 'std::string tensorflow::lingvo::InputResource<RecordProcessorClass>::DebugString() const
[with RecordProcessorClass = tensorflow::lingvo::{anonymous}::GenericInputProcessor; std::string = std::basic_string<char>]' marked 'overr
ide', but does not override
std::string DebugString() const override { return "lingvo InputResource"; }
^~~~~~~~~~~
./lingvo/core/ops/input_common.h:167:3: error: 'tensorflow::lingvo::InputResource<RecordProcessorClass>::~InputResource() [with RecordProc
essorClass = tensorflow::lingvo::{anonymous}::GenericInputProcessor]' marked 'override', but does not override
~InputResource() override { delete batcher_; }
^
./lingvo/core/ops/input_common.h: At global scope:
./lingvo/core/ops/input_common.h:169:8: warning: 'void tensorflow::lingvo::InputResource<RecordProcessorClass>::GetNext(tensorflow::OpKern
elContext*) [with RecordProcessorClass = tensorflow::lingvo::{anonymous}::GenericInputProcessor]' used but never defined
void GetNext(OpKernelContext* ctx) {
^~~~~~~
Target //lingvo:trainer failed to build
INFO: Elapsed time: 11.250s, Critical Path: 11.06s
INFO: 17 processes: 5 internal, 12 processwrapper-sandbox.
FAILED: Build did NOT complete successfully
I built waymo-open-dataset-tf-2-7-0
by myself, and I was able to build lingvo/trainer
with tensorflow==2.7.0
.
Closing issue.
ref: https://github.com/waymo-research/waymo-open-dataset/issues/548
Hi, I am trying to reproduce results of DeepFusion with Waymo Open Dataset. I have fixed the issue mentioned here by downgrading
tensorflow
from 2.9 to 2.7. But it fails to be executed when I try the following command. It crashes with segmentation fault.I found out that the code fails when it tries to import py_camera_model_ops from waymo_open_dataset.camera.ops.
This problem is reproducible in the waymo tutorial, when
tensorflow==2.7.*
, and it does not happen whentensorflow==2.6.0
.I tried building
lingvo/trainer
withtensorflow==2.6.0
, but it fails with the following error:This issue looked similar, so I tried again after installing
tensorstore
, but it didn't work (same error).I use
docker/dev.dockerfile
for my environment.Do you know how I can fix this?