protocolbuffers / protobuf

Protocol Buffers - Google's data interchange format
http://protobuf.dev
Other
65.21k stars 15.44k forks source link

Deserializing JSON message with a map of Structs, find() fails on the Struct's fields() #15069

Open gazar opened 9 months ago

gazar commented 9 months ago

What version of protobuf and what language are you using? Version: v26.0-rc2 Language: C++

What operating system (Linux, Windows, ...) and version? Linux Ubuntu 20

What runtime / compiler are you using (e.g., python version or gcc version) GCC 9

What did you do? With a protobuf message defined like the following:

syntax = "proto3";
import "google/protobuf/struct.proto";

package test;

message Content
{
    map<string, google.protobuf.Struct> components = 1;
}

And a JSON string that looks like the following:

{
    "components": {
      "comp1": {
        "size": 1234,
        "tag": "hello"
      },
      "comp2": {
        "size": 5678,
        "tag": "world"
      }
    }
}
  1. Call JsonStringToMessage.
  2. Find the comp2 component (doesn't matter if via find() or iteration).
  3. Call find() on the google.protobuf.Struct's fields() map to get the tag field's value.

What did you expect to see An iterator to the tag field.

What did you see instead? std::end(). UPDATE 2024-02-21 (v26.0-rc2): Inconsistently!

Note that if step 3 is done via iteration of the fields in a for loop, then tag is iterated correctly.

Additional Information

Code Example Attached is a short C++ sample test code that illustrates the problem:

#include <iostream>
#include <string>
#include <google/protobuf/util/json_util.h>
#include "test.pb.h"

using namespace google::protobuf::util;

static const std::string content_json = R"(
{
    "components": {
      "comp1": {
        "size": 16216459,
        "tag": "5f8e2a5b91b516f8eda6ef0ca0fd07bf96d3029bdf2ba9281e220a18cd542fc1"
      },
      "comp2": {
        "size": 22816265,
        "tag": "917b2d0d83ef8aa3ebcd169b53a0b6e9c72fee779785fae4f63cdec5694d5e47"
      }
    }
}
)";

int main()
{
    // Convert JSON to protobuf.
    JsonParseOptions options;
    test::Content content;
    absl::Status status = JsonStringToMessage(content_json, &content, options);

    // Get the fields of the 'comp2'.
    const auto component_iter = content.components().find("comp2");
    const auto& fields = component_iter->second.fields();

    std::cout << "Debug string: " << std::endl;
    std::cout << content.DebugString() << std::endl;
    std::cout << "Field count: " << fields.size() << std::endl;
    for (const auto& field : fields)
    {
        std::cout << "Key name: '" << field.first << "'" << std::endl;
    }

    // Find the 'tag' field of the component using the 'find' method.
    const auto tag_iter = fields.find("tag");
    if (tag_iter == std::end(fields))
    {
        std::cout << "No tag via find()" << std::endl;
        return -1;
    }

    std::cout << "Tag debug string: " << tag_iter->second.DebugString() << std::endl;

    return 0;
}

Program output:

Debug string: 
components {
  key: "comp1"
  value {
    fields {
      key: "size"
      value {
        number_value: 16216459
      }
    }
    fields {
      key: "tag"
      value {
        string_value: "5f8e2a5b91b516f8eda6ef0ca0fd07bf96d3029bdf2ba9281e220a18cd542fc1"
      }
    }
  }
}
components {
  key: "comp2"
  value {
    fields {
      key: "size"
      value {
        number_value: 22816265
      }
    }
    fields {
      key: "tag"
      value {
        string_value: "917b2d0d83ef8aa3ebcd169b53a0b6e9c72fee779785fae4f63cdec5694d5e47"
      }
    }
  }
}

Field count: 2
Key name: 'size'
Key name: 'tag'
No tag via find()
googleberg commented 7 months ago

@esrauchg please take a look

esrauchg commented 7 months ago

@gazar Do you mind narrowing down the bug report here? Is 'fields' zero length? Is there an entry for 'tag' but it doesn't have a string value or has an empty string?

I ran your repro code internally the listed behavior doesn't reproduce: it does find the tag and does not hit the "No tag via find" conditional. You may also find something interesting here if you remove the 'ignore_unknown_fields' and see if that gives you a parse failure from e.g. stale gencode or something (though its hard to see why that would be in this case)

Perhaps you can update your case, print the content.DebugString() to see what it says?

gazar commented 6 months ago

@esrauchg thank you for looking into this.

I've updated the example to shorten it and added more prints. I've managed to reproduce the issue again, on v26.0-rc2. However, this time the issue reproduces inconsistently. The following two prints sometimes change order, but the issue may reproduce regardless of their order:

Key name: 'size'
Key name: 'tag'
github-actions[bot] commented 3 months ago

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago.

gazar commented 3 months ago

Still waiting for someone to have a closer look at the issue.

esrauchg commented 3 months ago

I tried your repro snippet and don't see the issue reproduce in internal main tip.

Can you clarify further the environment that you get the issue? If you're only able to reproduce this under one specific Ubuntu configuration then I'm suspicious that this is most likely an issue of the system having some vendored protobuf package installed that is being linked compared to your intended protobuf version (where the latter is the used for the .pb.h generation); C++ Protobuf requires exact version match between the protoc and the linked proto runtime (even 'minor' version releases are not guaranteed to be skew safe compatible between generated code and runtime).

gazar commented 3 months ago

@esrauchg thank you for taking the time to look into this further.

  1. The issue reproduces inconsistently - I'm still getting results similar to my comment from Feb 21st.
  2. The issue reproduces also in version v27.0-rc3.
  3. The executable uses the correct protobuf shared libraries:
    # ldd -d bin/test
        linux-vdso.so.1 (0x00007fff9d1b9000)
        libprotobuf.so.27.0.3 => /home/gazar/.conan/data/protobuf/27.0-rc3+2/pan/dev/package/4dd49b453a8d01f6f5acb40cd2c983d832a6da45/lib/libprotobuf.so.27.0.3 (0x00007fe22d2d4000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fe22d2a7000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fe22d0c5000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fe22cf76000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fe22cf5b000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe22cd67000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fe22d70b000)
  4. I'm using the correct protoc executable to regenerate the C++ files from the .proto file: /home/gazar/.conan/data/protobuf/27.0-rc3+2/pan/dev/package/4dd49b453a8d01f6f5acb40cd2c983d832a6da45/bin/protoc --cpp_out=. test.proto
  5. Note the use of the older GCC 9 compiler on Ubuntu 20.04. I'm using WSL but the issue reproduced on non-WSL test VMs.
  6. I've built both protobuf and the test executable with the _GLIBCXX_USE_CXX11_ABI=0 flag.
esrauchg commented 3 months ago

Can you provide the specific gcc version number that you're seeing the issue on? Thanks!

gazar commented 3 months ago
$ g++-9 --version
g++-9 (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
github-actions[bot] commented 3 weeks ago

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago. This issue will be closed and archived after 14 additional days without activity.