protocolbuffers / protobuf

Protocol Buffers - Google's data interchange format
http://protobuf.dev
Other
65.81k stars 15.52k forks source link

Deserializing JSON message with a map of Structs, find() fails on the Struct's fields() #15069

Open gazar opened 11 months ago

gazar commented 11 months ago

What version of protobuf and what language are you using? Version: v26.0-rc2 Language: C++

What operating system (Linux, Windows, ...) and version? Linux Ubuntu 20

What runtime / compiler are you using (e.g., python version or gcc version) GCC 9

What did you do? With a protobuf message defined like the following:

syntax = "proto3";
import "google/protobuf/struct.proto";

package test;

message Content
{
    map<string, google.protobuf.Struct> components = 1;
}

And a JSON string that looks like the following:

{
    "components": {
      "comp1": {
        "size": 1234,
        "tag": "hello"
      },
      "comp2": {
        "size": 5678,
        "tag": "world"
      }
    }
}
  1. Call JsonStringToMessage.
  2. Find the comp2 component (doesn't matter if via find() or iteration).
  3. Call find() on the google.protobuf.Struct's fields() map to get the tag field's value.

What did you expect to see An iterator to the tag field.

What did you see instead? std::end(). UPDATE 2024-02-21 (v26.0-rc2): Inconsistently!

Note that if step 3 is done via iteration of the fields in a for loop, then tag is iterated correctly.

Additional Information

Code Example Attached is a short C++ sample test code that illustrates the problem:

#include <iostream>
#include <string>
#include <google/protobuf/util/json_util.h>
#include "test.pb.h"

using namespace google::protobuf::util;

static const std::string content_json = R"(
{
    "components": {
      "comp1": {
        "size": 16216459,
        "tag": "5f8e2a5b91b516f8eda6ef0ca0fd07bf96d3029bdf2ba9281e220a18cd542fc1"
      },
      "comp2": {
        "size": 22816265,
        "tag": "917b2d0d83ef8aa3ebcd169b53a0b6e9c72fee779785fae4f63cdec5694d5e47"
      }
    }
}
)";

int main()
{
    // Convert JSON to protobuf.
    JsonParseOptions options;
    test::Content content;
    absl::Status status = JsonStringToMessage(content_json, &content, options);

    // Get the fields of the 'comp2'.
    const auto component_iter = content.components().find("comp2");
    const auto& fields = component_iter->second.fields();

    std::cout << "Debug string: " << std::endl;
    std::cout << content.DebugString() << std::endl;
    std::cout << "Field count: " << fields.size() << std::endl;
    for (const auto& field : fields)
    {
        std::cout << "Key name: '" << field.first << "'" << std::endl;
    }

    // Find the 'tag' field of the component using the 'find' method.
    const auto tag_iter = fields.find("tag");
    if (tag_iter == std::end(fields))
    {
        std::cout << "No tag via find()" << std::endl;
        return -1;
    }

    std::cout << "Tag debug string: " << tag_iter->second.DebugString() << std::endl;

    return 0;
}

Program output:

Debug string: 
components {
  key: "comp1"
  value {
    fields {
      key: "size"
      value {
        number_value: 16216459
      }
    }
    fields {
      key: "tag"
      value {
        string_value: "5f8e2a5b91b516f8eda6ef0ca0fd07bf96d3029bdf2ba9281e220a18cd542fc1"
      }
    }
  }
}
components {
  key: "comp2"
  value {
    fields {
      key: "size"
      value {
        number_value: 22816265
      }
    }
    fields {
      key: "tag"
      value {
        string_value: "917b2d0d83ef8aa3ebcd169b53a0b6e9c72fee779785fae4f63cdec5694d5e47"
      }
    }
  }
}

Field count: 2
Key name: 'size'
Key name: 'tag'
No tag via find()
googleberg commented 9 months ago

@esrauchg please take a look

esrauchg commented 9 months ago

@gazar Do you mind narrowing down the bug report here? Is 'fields' zero length? Is there an entry for 'tag' but it doesn't have a string value or has an empty string?

I ran your repro code internally the listed behavior doesn't reproduce: it does find the tag and does not hit the "No tag via find" conditional. You may also find something interesting here if you remove the 'ignore_unknown_fields' and see if that gives you a parse failure from e.g. stale gencode or something (though its hard to see why that would be in this case)

Perhaps you can update your case, print the content.DebugString() to see what it says?

gazar commented 9 months ago

@esrauchg thank you for looking into this.

I've updated the example to shorten it and added more prints. I've managed to reproduce the issue again, on v26.0-rc2. However, this time the issue reproduces inconsistently. The following two prints sometimes change order, but the issue may reproduce regardless of their order:

Key name: 'size'
Key name: 'tag'
github-actions[bot] commented 6 months ago

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago.

gazar commented 6 months ago

Still waiting for someone to have a closer look at the issue.

esrauchg commented 6 months ago

I tried your repro snippet and don't see the issue reproduce in internal main tip.

Can you clarify further the environment that you get the issue? If you're only able to reproduce this under one specific Ubuntu configuration then I'm suspicious that this is most likely an issue of the system having some vendored protobuf package installed that is being linked compared to your intended protobuf version (where the latter is the used for the .pb.h generation); C++ Protobuf requires exact version match between the protoc and the linked proto runtime (even 'minor' version releases are not guaranteed to be skew safe compatible between generated code and runtime).

gazar commented 6 months ago

@esrauchg thank you for taking the time to look into this further.

  1. The issue reproduces inconsistently - I'm still getting results similar to my comment from Feb 21st.
  2. The issue reproduces also in version v27.0-rc3.
  3. The executable uses the correct protobuf shared libraries:
    # ldd -d bin/test
        linux-vdso.so.1 (0x00007fff9d1b9000)
        libprotobuf.so.27.0.3 => /home/gazar/.conan/data/protobuf/27.0-rc3+2/pan/dev/package/4dd49b453a8d01f6f5acb40cd2c983d832a6da45/lib/libprotobuf.so.27.0.3 (0x00007fe22d2d4000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fe22d2a7000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fe22d0c5000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fe22cf76000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fe22cf5b000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe22cd67000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fe22d70b000)
  4. I'm using the correct protoc executable to regenerate the C++ files from the .proto file: /home/gazar/.conan/data/protobuf/27.0-rc3+2/pan/dev/package/4dd49b453a8d01f6f5acb40cd2c983d832a6da45/bin/protoc --cpp_out=. test.proto
  5. Note the use of the older GCC 9 compiler on Ubuntu 20.04. I'm using WSL but the issue reproduced on non-WSL test VMs.
  6. I've built both protobuf and the test executable with the _GLIBCXX_USE_CXX11_ABI=0 flag.
esrauchg commented 6 months ago

Can you provide the specific gcc version number that you're seeing the issue on? Thanks!

gazar commented 6 months ago
$ g++-9 --version
g++-9 (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
github-actions[bot] commented 3 months ago

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago. This issue will be closed and archived after 14 additional days without activity.

gazar commented 3 weeks ago

Reproduced with v5.28.3. I've looked at the FindHelper function in map.h. Inconsistently, when running the test program numerous times:

The bucket number is always either 0 or 1, but its value does not seem to indicate success or failure.

esrauchg commented 3 weeks ago

Sorry, I was never able to reproduce this internally. Are you able to reproduce while running under sanitizers to see if that points at something?

gazar commented 3 weeks ago

I've built with address sanitizer, and even running protoc without arguments segfaults:

Reading symbols from /home/gazar/.conan/data/protobuf/5.28.3+2/pan/dev/package/923ebda687835f322df0521eb072bca6e00f8552/bin/protoc...
(gdb) run
Starting program: /home/gazar/.conan/data/protobuf/5.28.3+2/pan/dev/package/923ebda687835f322df0521eb072bca6e00f8552/bin/protoc 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff60c5f15 in absl::lts_20240722::container_internal::HashSetIteratorGenerationInfoEnabled::HashSetIteratorGenerationInfoEnabled (generation_ptr=0x0, this=0x7fffffffd3d0)
    at /home/gazar/.conan/data/abseil/20240722.0+0/pan/dev/package/b245d1481049ccf68a48be5517d59925a0aab8c9/include/absl/container/internal/raw_hash_set.h:2493
2493        iterator(ctrl_t* ctrl, slot_type* slot,

(gdb) bt
#0  0x00007ffff60c5f15 in absl::lts_20240722::container_internal::HashSetIteratorGenerationInfoEnabled::HashSetIteratorGenerationInfoEnabled (generation_ptr=0x0, this=0x7fffffffd3d0)
    at /home/gazar/.conan/data/abseil/20240722.0+0/pan/dev/package/b245d1481049ccf68a48be5517d59925a0aab8c9/include/absl/container/internal/raw_hash_set.h:2493
#1  absl::lts_20240722::container_internal::raw_hash_set<absl::lts_20240722::container_internal::FlatHashMapPolicy<std::string, google::protobuf::Descriptor::WellKnownType>, absl::lts_20240722::container_internal::StringHash, absl::lts_20240722::container_internal::StringEq, std::allocator<std::pair<std::string const, google::protobuf::Descriptor::WellKnownType> > >::iterator::iterator (generation_ptr=0x0, slot=0x616000000168, ctrl=<optimized out>, this=0x7fffffffd3d0)
    at /home/gazar/.conan/data/abseil/20240722.0+0/pan/dev/package/b245d1481049ccf68a48be5517d59925a0aab8c9/include/absl/container/internal/raw_hash_set.h:2497
#2  absl::lts_20240722::container_internal::raw_hash_set<absl::lts_20240722::container_internal::FlatHashMapPolicy<std::string, google::protobuf::Descriptor::WellKnownType>, absl::lts_20240722::container_internal::StringHash, absl::lts_20240722::container_internal::StringEq, std::allocator<std::pair<std::string const, google::protobuf::Descriptor::WellKnownType> > >::soo_iterator (this=0x616000000140) at /home/gazar/.conan/data/abseil/20240722.0+0/pan/dev/package/b245d1481049ccf68a48be5517d59925a0aab8c9/include/absl/container/internal/raw_hash_set.h:3971
#3  absl::lts_20240722::container_internal::raw_hash_set<absl::lts_20240722::container_internal::FlatHashMapPolicy<std::string, google::protobuf::Descriptor::WellKnownType>, absl::lts_20240722::container_internal::StringHash, absl::lts_20240722::container_internal::StringEq, std::allocator<std::pair<std::string const, google::protobuf::Descriptor::WellKnownType> > >::find_or_prepare_insert_soo<std::string> (key="google.protobuf.DoubleValue", this=0x616000000140)
    at /home/gazar/.conan/data/abseil/20240722.0+0/pan/dev/package/b245d1481049ccf68a48be5517d59925a0aab8c9/include/absl/container/internal/raw_hash_set.h:3791
#4  absl::lts_20240722::container_internal::raw_hash_set<absl::lts_20240722::container_internal::FlatHashMapPolicy<std::string, google::protobuf::Descriptor::WellKnownType>, absl::lts_20240722::container_internal::StringHash, absl::lts_20240722::container_internal::StringEq, std::allocator<std::pair<std::string const, google::protobuf::Descriptor::WellKnownType> > >::find_or_prepare_insert<std::string> (key="google.protobuf.DoubleValue", this=0x616000000140)
    at /home/gazar/.conan/data/abseil/20240722.0+0/pan/dev/package/b245d1481049ccf68a48be5517d59925a0aab8c9/include/absl/container/internal/raw_hash_set.h:3886
#5  absl::lts_20240722::container_internal::raw_hash_set<absl::lts_20240722::container_internal::FlatHashMapPolicy<std::string, google::protobuf::Descriptor::WellKnownType>, absl::lts_20240722::container_internal::StringHash, absl::lts_20240722::container_internal::StringEq, std::allocator<std::pair<std::string const, google::protobuf::Descriptor::WellKnownType> > >::EmplaceDecomposable::operator()<std::string, std::piecewise_construct_t const&, std::tuple<std::string const&>, std::tuple<google::protobuf::Descriptor::WellKnownType const&> > (this=0x7fffffffd010, 
    key="google.protobuf.DoubleValue") at /home/gazar/.conan/data/abseil/20240722.0+0/pan/dev/package/b245d1481049ccf68a48be5517d59925a0aab8c9/include/absl/container/internal/raw_hash_set.h:3453
#6  absl::lts_20240722::container_internal::memory_internal::DecomposePairImpl<absl::lts_20240722::container_internal::raw_hash_set<absl::lts_20240722::container_internal::FlatHashMapPolicy<std::string, google::protobuf::Descriptor::WellKnownType>, absl::lts_20240722::container_internal::StringHash, absl::lts_20240722::container_internal::StringEq, std::allocator<std::pair<std::string const, google::protobuf::Descriptor::WellKnownType> > >::EmplaceDecomposable, std::string const&, std::tuple<google::protobuf::Descriptor::WellKnownType const&> > (p={...}, f=...)
    at /home/gazar/.conan/data/abseil/20240722.0+0/pan/dev/package/b245d1481049ccf68a48be5517d59925a0aab8c9/include/absl/container/internal/container_memory.h:152
#7  absl::lts_20240722::container_internal::DecomposePair<absl::lts_20240722::container_internal::raw_hash_set<absl::lts_20240722::container_internal::FlatHashMapPolicy<std::string, google::protobuf::Descriptor::WellKnownType>, absl::lts_20240722::container_internal::StringHash, absl::lts_20240722::container_internal::StringEq, std::allocator<std::pair<std::string const, google::protobuf::Descriptor::WellKnownType> > >::EmplaceDecomposable, std::pair<std::string, google::protobuf::Descriptor::WellKnownType> const&> (f=...)
    at /home/gazar/.conan/data/abseil/20240722.0+0/pan/dev/package/b245d1481049ccf68a48be5517d59925a0aab8c9/include/absl/container/internal/container_memory.h:219
#8  absl::lts_20240722::container_internal::FlatHashMapPolicy<std::string, google::protobuf::Descriptor::WellKnownType>::apply<absl::lts_20240722::container_internal::raw_hash_set<absl::lts_20240722::container_internal::FlatHashMapPolicy<std::string, google::protobuf::Descriptor::WellKnownType>, absl::lts_20240722::container_internal::StringHash, absl::lts_20240722::container_internal::StringEq, std::allocator<std::pair<std::string const, google::protobuf::Descriptor::WellKnownType> > >::EmplaceDecomposable, std::pair<std::string, google::protobuf::Descriptor::WellKnownType> const&>
    (f=...) at /home/gazar/.conan/data/abseil/20240722.0+0/pan/dev/package/b245d1481049ccf68a48be5517d59925a0aab8c9/include/absl/container/flat_hash_map.h:640
#9  absl::lts_20240722::container_internal::hash_policy_traits<absl::lts_20240722::container_internal::FlatHashMapPolicy<std::string, google::protobuf::Descriptor::WellKnownType>, void>::apply<absl::lts_20240722::container_internal::raw_hash_set<absl::lts_20240722::container_internal::FlatHashMapPolicy<std::string, google::protobuf::Descriptor::WellKnownType>, absl::lts_20240722::container_internal::StringHash, absl::lts_20240722::container_internal::StringEq, std::allocator<std::pair<std::string const, google::protobuf::Descriptor::WellKnownType> > >::EmplaceDecomposable, std::pair<std::string, google::protobuf::Descriptor::WellKnownType> const&, absl::lts_20240722::container_internal::FlatHashMapPolicy<std::string, google::protobuf::Descriptor::WellKnownType> > (f=...)
    at /home/gazar/.conan/data/abseil/20240722.0+0/pan/dev/package/b245d1481049ccf68a48be5517d59925a0aab8c9/include/absl/container/internal/hash_policy_traits.h:134
#10 absl::lts_20240722::container_internal::raw_hash_set<absl::lts_20240722::container_internal::FlatHashMapPolicy<std::string, google::protobuf::Descriptor::WellKnownType>, absl::lts_20240722::container_internal::StringHash, absl::lts_20240722::container_internal::StringEq, std::allocator<std::pair<std::string const, google::protobuf::Descriptor::WellKnownType> > >::emplace<std::pair<std::string, google::protobuf::Descriptor::WellKnownType> const&, 0> (this=0x616000000140)
    at /home/gazar/.conan/data/abseil/20240722.0+0/pan/dev/package/b245d1481049ccf68a48be5517d59925a0aab8c9/include/absl/container/internal/raw_hash_set.h:3012
#11 absl::lts_20240722::container_internal::raw_hash_set<absl::lts_20240722::container_internal::FlatHashMapPolicy<std::string, google::protobuf::Descriptor::WellKnownType>, absl::lts_20240722::container_internal::StringHash, absl::lts_20240722::container_internal::StringEq, std::allocator<std::pair<std::string const, google::protobuf::Descriptor::WellKnownType> > >::insert<std::pair<std::string, google::protobuf::Descriptor::WellKnownType> const*> (last=0x7fffffffd960, first=0x7fffffffd860, this=0x616000000140)
    at /home/gazar/.conan/data/abseil/20240722.0+0/pan/dev/package/b245d1481049ccf68a48be5517d59925a0aab8c9/include/absl/container/internal/raw_hash_set.h:2966
#12 absl::lts_20240722::container_internal::raw_hash_set<absl::lts_20240722::container_internal::FlatHashMapPolicy<std::string, google::protobuf::Descriptor::WellKnownType>, absl::lts_20240722::container_internal::StringHash, absl::lts_20240722::container_internal::StringEq, std::allocator<std::pair<std::string const, google::protobuf::Descriptor::WellKnownType> > >::insert (ilist=..., this=0x616000000140) at /home/gazar/.conan/data/abseil/20240722.0+0/pan/dev/package/b245d1481049ccf68a48be5517d59925a0aab8c9/include/absl/container/internal/raw_hash_set.h:2975
#13 google::protobuf::DescriptorPool::Tables::Tables (this=0x616000000080) at /home/gazar/.conan/data/protobuf/5.28.3+2/pan/dev/build/923ebda687835f322df0521eb072bca6e00f8552/src/google/protobuf/descriptor.cc:1594
#14 0x00007ffff60c9288 in google::protobuf::DescriptorPool::DescriptorPool (this=0x60c000000040, fallback_database=<optimized out>, error_collector=<optimized out>) at /usr/include/c++/9/tuple:918
#15 0x00007ffff60c987a in google::protobuf::(anonymous namespace)::NewGeneratedPool () at /home/gazar/.conan/data/protobuf/5.28.3+2/pan/dev/build/923ebda687835f322df0521eb072bca6e00f8552/src/google/protobuf/descriptor.cc:2181
#16 google::protobuf::DescriptorPool::internal_generated_pool () at /home/gazar/.conan/data/protobuf/5.28.3+2/pan/dev/build/923ebda687835f322df0521eb072bca6e00f8552/src/google/protobuf/descriptor.cc:2194
#17 0x00007ffff60c9a58 in google::protobuf::DescriptorPool::InternalAddGeneratedFile (encoded_file_descriptor=0x7ffff69d89c0 <descriptor_table_protodef_google_2fprotobuf_2fany_2eproto>, size=212)
    at /home/gazar/.conan/data/protobuf/5.28.3+2/pan/dev/build/923ebda687835f322df0521eb072bca6e00f8552/src/google/protobuf/descriptor.cc:2235
#18 0x00007ffff638c3b8 in google::protobuf::(anonymous namespace)::AddDescriptorsImpl (table=0x7ffff6a560c0 <descriptor_table_google_2fprotobuf_2fany_2eproto>)
    at /home/gazar/.conan/data/protobuf/5.28.3+2/pan/dev/build/923ebda687835f322df0521eb072bca6e00f8552/src/google/protobuf/generated_message_reflection.cc:3713
#19 google::protobuf::internal::AddDescriptors (table=table@entry=0x7ffff6a560c0 <descriptor_table_google_2fprotobuf_2fany_2eproto>) at /home/gazar/.conan/data/protobuf/5.28.3+2/pan/dev/build/923ebda687835f322df0521eb072bca6e00f8552/src/google/protobuf/generated_message_reflection.cc:3728
#20 0x00007ffff5ef1990 in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=102) at /home/gazar/.conan/data/protobuf/5.28.3+2/pan/dev/build/923ebda687835f322df0521eb072bca6e00f8552/src/google/protobuf/port_undef.inc:175
#21 __static_initialization_and_destruction_0 (__priority=102, __initialize_p=1) at /home/gazar/.conan/data/protobuf/5.28.3+2/pan/dev/build/923ebda687835f322df0521eb072bca6e00f8552/src/google/protobuf/port_undef.inc:175
#22 _GLOBAL__sub_I.00102_any.pb.cc(void) () at /home/gazar/.conan/data/protobuf/5.28.3+2/pan/dev/build/923ebda687835f322df0521eb072bca6e00f8552/src/google/protobuf/port_undef.inc:175
#23 0x00007ffff7fe0b9a in call_init (l=<optimized out>, argc=argc@entry=1, argv=argv@entry=0x7fffffffdd18, env=env@entry=0x7fffffffdd28) at dl-init.c:72
#24 0x00007ffff7fe0ca1 in call_init (env=0x7fffffffdd28, argv=0x7fffffffdd18, argc=1, l=<optimized out>) at dl-init.c:30
#25 _dl_init (main_map=0x7ffff7ffe190, argc=1, argv=0x7fffffffdd18, env=0x7fffffffdd28) at dl-init.c:119
#26 0x00007ffff7fd013a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#27 0x0000000000000001 in ?? ()
#28 0x00007fffffffdfd9 in ?? ()
#29 0x0000000000000000 in ?? ()

This is what I've added to protobuf's CMake build:

'CMAKE_CXX_FLAGS' = ' -fsanitize=address -static-libasan -g -fno-omit-frame-pointer'
'CMAKE_C_FLAGS' = ' -fsanitize=address -static-libasan -g -fno-omit-frame-pointer'
'CMAKE_SHARED_LINKER_FLAGS' = ' -fsanitize=address -static-libasan -g -fno-omit-frame-pointer'
'CMAKE_EXE_LINKER_FLAGS' = ' -fsanitize=address -static-libasan -g -fno-omit-frame-pointer'