protocolbuffers / protobuf

Protocol Buffers - Google's data interchange format
http://protobuf.dev
Other
65.28k stars 15.45k forks source link

ODR violation using the Bazel build #4311

Closed nelhage-stripe closed 6 years ago

nelhage-stripe commented 6 years ago

As best I can tell, if you use the included Bazel build and attempt to depend on descriptor.proto (by way of @com_google_protobuf//:descriptor_proto), the build ends up with two copies of descriptor.proto and multiple definitions of the C++ symbols contained therein. This triggers ODR violations from ASAN, and we've seen deadlocks inside static initializers that we believe to be related.

I've pushed a minimal reproducer to https://github.com/nelhage/rules_proto_odr

If you build that repository with a newish clang (I'm using a clang 6 nightly), and bazel run main, you'll see the ODR violation from ASAN:

[nelhage@nelhage5:~/code/proto_odr]$ CC=clang-6.0 bazel run main
INFO: Analysed target //:main (0 packages loaded).
INFO: Found 1 target...
Target //:main up-to-date:
  bazel-bin/main
INFO: Elapsed time: 0.274s, Critical Path: 0.00s
INFO: Build completed successfully, 1 total action

INFO: Running command line: bazel-bin/main
=================================================================
==16003==ERROR: AddressSanitizer: odr-violation (0x7f6baf98b340):
  [1] size=48 'google::protobuf::_FileDescriptorSet_default_instance_' bazel-out/k8-fastbuild/genfiles/external/com_google_protobuf/google/protobuf/descriptor.pb.cc:28:3
  [2] size=48 'google::protobuf::_FileDescriptorSet_default_instance_' external/com_google_protobuf/src/google/protobuf/descriptor.pb.cc:28:3
These globals were registered at these points:
  [1]:
    #0 0x431fa0 in __asan_register_globals.part.11 (/home/nelhage/.cache/bazel/_bazel_nelhage/59eff58f9c182bc06a3fc860075e892e/execroot/com_github_nelhage_proto_odr/bazel-out/k8-fastbuild/bin/main+0x431fa0)
    #1 0x7f6baf805bfd in asan.module_ctor (/home/nelhage/.cache/bazel/_bazel_nelhage/59eff58f9c182bc06a3fc860075e892e/execroot/com_github_nelhage_proto_odr/bazel-out/k8-fastbuild/bin/_solib_k8/libexternal_Scom_Ugoogle_Uprotobuf_Slibdescriptor_Uproto.so+0x2a7bfd)

  [2]:
    #0 0x431fa0 in __asan_register_globals.part.11 (/home/nelhage/.cache/bazel/_bazel_nelhage/59eff58f9c182bc06a3fc860075e892e/execroot/com_github_nelhage_proto_odr/bazel-out/k8-fastbuild/bin/main+0x431fa0)
    #1 0x7f6bae6c7d2d in asan.module_ctor (/home/nelhage/.cache/bazel/_bazel_nelhage/59eff58f9c182bc06a3fc860075e892e/execroot/com_github_nelhage_proto_odr/bazel-out/k8-fastbuild/bin/_solib_k8/libexternal_Scom_Ugoogle_Uprotobuf_Slibprotobuf.so+0xd02d2d)

==16003==HINT: if you don't care about these errors you may set ASAN_OPTIONS=detect_odr_violation=0
SUMMARY: AddressSanitizer: odr-violation: global 'google::protobuf::_FileDescriptorSet_default_instance_' at bazel-out/k8-fastbuild/genfiles/external/com_google_protobuf/google/protobuf/descriptor.pb.cc:28:3
==16003==ABORTING
ERROR: Non-zero return code '1' from command: Process exited with status 1
jmillikin-stripe commented 6 years ago

Based on the existence of internal_bootstrap_hack in cc_proto_library I believe this is a known issue, but the protobuf devs should document it better.

@com_google_protobuf//:protobuf has srcs=[..., "src/google/protobuf/descriptor.pb.cc", ...] which makes the headers and symbols available to C++ code. @com_google_protobuf//:cc_wkt_protos is a cc_proto_library that can be depended on, and uses internal_bootstrap_hack to avoid actually generating any C++ sources (because they would create duplicate symbols).

Is there any way to cause a warning to be emitted if a user tries to wrap one of the internal protos in their own cc_proto_library target? Perhaps by checking if any of the srcs labels is a member of WELL_KNOWN_PROTOS ?

nelhage commented 6 years ago

I may have minimized too far; https://github.com/nelhage/rules_proto_odr/tree/69ecec3d7182499acc2fe7a807f99abf4f58aa5e contains an example that's closer to what triggered this in our real application. We weren't creating a cc_proto_library directly out of @com_google_protobuf//:descriptor_proto, but out of our own .proto which imported descriptor.proto, which we accessed by adding @com_google_protobuf//:descriptor_proto to its deps. Is that unsupported? Is there another way to access descriptor.proto?

Agreed that an earlier error here would be nice.

jmillikin-stripe commented 6 years ago

Ooh, I think this is a Bazel bug. Details at https://github.com/bazelbuild/bazel/issues/4650


Building nelhage/rules_proto_odr@69ecec3d7182499acc2fe7a807f99abf4f58aa5e generates these commands:

SUBCOMMAND: # //proto:proto_proto [action 'Generating C++ proto_library //proto:proto_proto']
(cd /root/.cache/bazel/_bazel_root/f8087e59fd95af1ae29e8fcb7ff1a3dc/execroot/com_github_nelhage_proto_odr && \
  exec env - \
    PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
  bazel-out/host/bin/external/com_google_protobuf/protoc '--cpp_out=bazel-out/k8-fastbuild/genfiles' '-Iproto/with_extensions.proto=proto/with_extensions.proto' '-Igoogle/protobuf/descriptor.proto=bazel-out/k8-fastbuild/genfiles/external/com_google_protobuf/google/protobuf/descriptor.proto' --direct_dependencies google/protobuf/descriptor.proto:proto/with_extensions.proto '--direct_dependencies_violation_msg=%s is imported, but //proto:proto_proto doesn'\''t directly depend on a proto_library that '\''srcs'\'' it.' proto/with_extensions.proto)

SUBCOMMAND: # @com_google_protobuf//:descriptor_proto [action 'Generating C++ proto_library @com_google_protobuf//:descriptor_proto']
(cd /root/.cache/bazel/_bazel_root/f8087e59fd95af1ae29e8fcb7ff1a3dc/execroot/com_github_nelhage_proto_odr && \
  exec env - \
    PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
  bazel-out/host/bin/external/com_google_protobuf/protoc '--cpp_out=bazel-out/k8-fastbuild/genfiles/external/com_google_protobuf' '-Igoogle/protobuf/descriptor.proto=bazel-out/k8-fastbuild/genfiles/external/com_google_protobuf/google/protobuf/descriptor.proto' --direct_dependencies google/protobuf/descriptor.proto '--direct_dependencies_violation_msg=%s is imported, but @com_google_protobuf//:descriptor_proto doesn'\''t directly depend on a proto_library that '\''srcs'\'' it.' bazel-out/k8-fastbuild/genfiles/external/com_google_protobuf/google/protobuf/descriptor.proto)

SUBCOMMAND: # //proto:proto_proto [action 'Compiling proto/with_extensions.pb.cc']
(cd /root/.cache/bazel/_bazel_root/f8087e59fd95af1ae29e8fcb7ff1a3dc/execroot/com_github_nelhage_proto_odr && \
  exec env - \
    PWD=/proc/self/cwd \
  /usr/bin/clang-5.0 -U_FORTIFY_SOURCE -fstack-protector -Wall -B/usr/bin -B/usr/bin -fcolor-diagnostics -fno-omit-frame-pointer '-fsanitize=address' '-fsanitize=undefined' '-std=c++0x' '-fsanitize=address' '-fsanitize=undefined' -MD -MF bazel-out/k8-fastbuild/bin/proto/_objs/proto_proto/proto/with_extensions.pb.pic.d '-frandom-seed=bazel-out/k8-fastbuild/bin/proto/_objs/proto_proto/proto/with_extensions.pb.pic.o' -fPIC -iquote . -iquote bazel-out/k8-fastbuild/genfiles -iquote external/com_google_protobuf -iquote bazel-out/k8-fastbuild/genfiles/external/com_google_protobuf -iquote external/bazel_tools -iquote bazel-out/k8-fastbuild/genfiles/external/bazel_tools -isystem external/com_google_protobuf/src -isystem bazel-out/k8-fastbuild/genfiles/external/com_google_protobuf/src -isystem external/bazel_tools/tools/cpp/gcc3 -no-canonical-prefixes -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c bazel-out/k8-fastbuild/genfiles/proto/with_extensions.pb.cc -o bazel-out/k8-fastbuild/bin/proto/_objs/proto_proto/proto/with_extensions.pb.pic.o)

There's no protobuf.bzl skylark involved here, it's pure Bazel native protobuf rules. The native aspect transits down the deps to find all proto_library targets, generates C++ code from them, and presumably links it into the final binary. It ought to be skipping code generation for .proto files that are required by libprotobuf.a, but isn't.

xfxyjwf commented 6 years ago

I see that you are linking the binary dynamically. Usually ODR violation at runtime will turn up as linker errors if it's linked statically, however, in this case it compiles fine after I change it to linkstatic = 1. It seems to me bazel is building the dynamic library for descriptor_proto incorrectly.

The use case here is supported and we have an example doing just that in (it's using timestamp_proto instead of descriptor_proto but they are built the same way): https://github.com/google/protobuf/blob/master/examples/BUILD

Both depending on descriptor_proto in another proto_library and depending on descriptor_proto in your own cc_proto_library should work. Although the later isn't really useful.

And yes, this is more of a bazel issue than a protobuf one.

nelhage commented 6 years ago

Yeah, I built dynamically specifically to illustrate the problem, since linking statically succeeds for reasons that I don't fully understand. I'm happy to close this one out and follow up on the bazel issue, if that's your recommendation.

xfxyjwf commented 6 years ago

I think we can leave it open until the bazel issue is confirmed/fixed. Right now it's unclear if there is anything needed to be done on the protobuf side.

jmillikin commented 6 years ago

Hit this for a personal project today, so figured I'd get to the bottom of it. Turns out the C++ proto toolchain defined by this package (not Bazel, surprisingly) is missing a "blacklist protos" setting.

PR https://github.com/google/protobuf/pull/4334 fixes the error for me.