protocolbuffers / protobuf

Protocol Buffers - Google's data interchange format
http://protobuf.dev
Other
65.69k stars 15.5k forks source link

Inconsistent map entry detection from message descriptors in Python #17351

Open alkasm opened 4 months ago

alkasm commented 4 months ago

What version of protobuf and what language are you using? Version: main/v3.6.0/v3.5.0 etc. (NOTE: please try updating to the latest version of protoc/runtime possible beforehand to attempt to resolve your problem) Language: C++/Java/Python/C#/Ruby/PHP/Objective-C/Javascript

5.27.2, Python (from PyPI)

What operating system (Linux, Windows, ...) and version?

macOS 14.4.1

What runtime / compiler are you using (e.g., python version or gcc version)

upb and python protobuf impls

What did you do? / What did you expect to see / What did you see instead?

Given book.proto:

syntax = "proto3";

message Book {
  map<string, string> reviews = 1;
}

The field descriptor for reviews shows a repeated field of an intermediary generated message type ReviewsEntry.

>>> from google.protobuf.descriptor import Descriptor, FieldDescriptor
>>> from book_pb2 import Book
>>> fd = Book.DESCRIPTOR.fields_by_name["reviews"]
>>> fd.message_type.name
'ReviewsEntry'
>>> fd.message_type._concrete_class == Book.ReviewsEntry
True
>>> fd.cpp_type == FieldDescriptor.CPPTYPE_MESSAGE
True
>>> fd.type == FieldDescriptor.TYPE_MESSAGE
True
>>> fd.label == FieldDescriptor.LABEL_REPEATED
True

As this looks like a standard repeated field, it's not obvious how to detect this is a map entry rather than a "true" repeated field from any public attributes on the field descriptor or message descriptor.

The google.protobuf.descriptor.Descriptor docstring implies that there should be an is_map_entry attribute on the descriptor: https://github.com/protocolbuffers/protobuf/blob/cbb3abfc4bf342e3f7cff1b657b46db0e1ef1537/python/google/protobuf/descriptor.py#L340

but there is no such attribute in the upb impl:

>>> fd.message_type.is_map_entry
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'google._upb._message.Descriptor' object has no attribute 'is_map_entry'

Even in the Python impl, the attribute is "internal" / prefixed with an underscore, as _is_map_entry, rather than the documented is_map_entry:

>>> import os
>>> os.environ["PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION"] = "python"
>>> from google.protobuf.descriptor import Descriptor, FieldDescriptor
>>> from book_pb2 import Book
>>> fd = Book.DESCRIPTOR.fields_by_name["reviews"]
>>> fd.message_type.is_map_entry
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Descriptor' object has no attribute 'is_map_entry'. Did you mean: '_is_map_entry'?
>>> fd.message_type._is_map_entry
True

But that "internal" / prefixed field does not exist in the upb impl either way.

I found this implementation in the repo which implies that the message will contain options to let you know that it is a map entry: https://github.com/protocolbuffers/protobuf/blob/cbb3abfc4bf342e3f7cff1b657b46db0e1ef1537/hpb_generator/gen_messages.cc#L386

And indeed this does work in both the Python and upb implementations as a reliable way to detect that it's a map entry:

>>> Book.DESCRIPTOR.fields_by_name["reviews"].message_type.GetOptions()
map_entry: true

But this should probably be exposed in a more direct/public API.

Anything else we should know about your project / environment

I came across this trying to create my own implementation of AIP-161 field mask semantics, since support for wildcards and map fields has not yet been implemented for Python (see https://github.com/protocolbuffers/protobuf/issues/8547). With AIP-161, wildcards should be allowed for repeated fields and map keys, but indexing map fields is allowed while indexing repeated fields is not. Hence, I need to reliably detect the difference between repeated fields and map fields to implement.

anandolee commented 4 months ago

Unfortunately, we do not provide public directly map entry check APIs on descriptors. The only correct way is from options. Even our internal implementation use the options, for example: https://github.com/protocolbuffers/protobuf/blob/77047d9cda0bd5478489d4b5a85f1907cfe7f4fa/python/google/protobuf/json_format.py#L165

alkasm commented 3 months ago

Unfortunately, we do not provide public directly map entry check APIs on descriptors. The only correct way is from options. Even our internal implementation use the options, for example:

https://github.com/protocolbuffers/protobuf/blob/77047d9cda0bd5478489d4b5a85f1907cfe7f4fa/python/google/protobuf/json_format.py#L165

~Got it, then probably the only thing to address from this issue is the inaccuracy in the Descriptor docstring which implies the non-existent is_map_entry attribute.~

Maybe it's documented as a parameter of the ctor & initializer, not documentation for the instance attributes? It's a little confusing since the docstring section is "Attributes"

github-actions[bot] commented 3 days ago

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago. This issue will be closed and archived after 14 additional days without activity.