protocolbuffers / protobuf

Protocol Buffers - Google's data interchange format
http://protobuf.dev
Other
65.17k stars 15.44k forks source link

Message FieldNameEntry not allowed as map value type for field_name #9520

Open ergl opened 2 years ago

ergl commented 2 years ago

What version of protobuf and what language are you using? Version: stable 3.19.4 Language: N/A

What operating system (Linux, Windows, ...) and version?

macOS 11.6.4

What runtime / compiler are you using (e.g., python version or gcc version)

N/A, happens before language-specifc code generation

What did you do?

Steps to reproduce the behavior:

  1. Attempt to use a message with a name of FieldNameEntry as a map value in field field_name

  2. See protoc return the error: map_entry should not be set explicitly. Use map<KeyType, ValueType> instead.

Minimal declaration:

syntax = "proto3"; // also happens on proto2

message AEntry {}

message Container {
    map<uint32, AEntry> a = 1;
}

What did you expect to see

I expected the declaration above to compile cleanly and with no errors.

What did you see instead?

The compiler emits map_entry should not be set explicitly. Use map<KeyType, ValueType> instead. before reaching any language-specific generator.

It took me a bit to realize this has to do with backwards compatibility between maps and repeated values, but the error message did not make sense to me at first, and I did not expect protoc to forbid this usage.

If this is an expected error, perhaps it should be documented (apologies if it is already, and I didn't see it)

elharo commented 2 years ago

Interesting. Do you have a complete reproducible test case, ideally as a failing PR here? I don't immediately see how the code corresponds to the report.

ergl commented 2 years ago

@elharo I don't have a failing PR at hand, sorry, but take the following file (let's say, test.proto)

syntax = "proto3";

message FooEntry {}

message Container {
    map<uint32, FooEntry> foo = 1;
}

If we try to generate code with protoc -I=. --python_out . test.proto, it will error:

# protoc -I=. --python_out . test.proto
test.proto: map_entry should not be set explicitly. Use map<KeyType, ValueType> instead.

I put --python_out above but any other language would work.

I don't immediately see how the code corresponds to the report.

Sorry for any confusion. What I mean is that if a field named foo (or any other name) has a type like map<_, FooEntry> (notice the matching name between the field and the map value type), then this error will be raised.

In my original example, the field is named a and the type is named AEntry. In the above example, the field is named foo and the type is FooEntry.

For snake_case fields, like field_name, the problematic type would need to be named the same, but transformed to CamelCase (FieldNameEntry).

ergl commented 2 years ago

The relevant checks happen here:

https://github.com/protocolbuffers/protobuf/blob/2f91da585e96a7efe43505f714f03c7716a94ecb/src/google/protobuf/descriptor.cc#L6905-L6906

elharo commented 2 years ago

Thanks for the additional info. Spec say, "The value_type can be any type except another map." so this should be OK. Assuming I can reproduce this, it's probably a bug.

esorot commented 2 years ago

The naming convention reserves *Entry for internal usage: https://github.com/protocolbuffers/protobuf/blob/2f91da585e96a7efe43505f714f03c7716a94ecb/src/google/protobuf/descriptor.cc#L6917 Closing this as this is WAD

parthea commented 2 years ago

It appears that *Entry is a reserved field name. In the interim I'll add workaround in my local code to rename the field to *Entry_ . Please could we re-open this issue? It would be great to have a more elegant solution.

elharo commented 2 years ago

I agree. This isn't documented in the spec and protoc should generate name mangling as needed to make this work.

ergl commented 2 years ago

Moving the *Entry message definition inside the message that contains the map field gives a more clarifying error message.

syntax = "proto3";

message Container {
    message FooEntry {}
    map<uint32, FooEntry> foo = 1;
}

Trying to compile the file above prints the following error:

# protoc -I=. --python_out . test.proto
test.proto: "FooEntry" is already defined in "Container".
test.proto:3:9: Expanded map entry type FooEntry conflicts with an existing nested message type.
fowles commented 2 years ago

At some point, we need to have a much more systematic review of how we handle keywords and conflicting names. This is another piece of that much larger puzzle

Logofile commented 2 years ago

I assume we want a documentation bandaid in the meantime? I can add something to the proto and proto3 topics.

fowles commented 2 years ago

Yeah

jhump commented 2 years ago

This isn't documented in the spec and protoc should generate name mangling as needed to make this work.

@elharo, no name mangling is even needed in this case. The generated entry would be nested inside the message, Container.FooEntry in the given example. But the message name in question is simply FooEntry. So the resulting descriptors from the example should behave "as if":

syntax = "proto3";

message FooEntry {}

message Container {
    message FooEntry {
        option map_entry = true;
        uint32 key = 1;
        .FooEntry value = 2;
    }
    repeated .Container.FooEntry foo = 1;
}

I haven't dug into the code, but I'm going to guess that the DescriptorProto.options.map_entry field was incorrectly set to true on the .FooEntry message, just based on the type's name (ending in Entry). But that seems easy to fix: it should instead only set the map_entry option for messages that are synthesized from map fields, not for all messages whose name ends in Entry.

googleberg commented 2 years ago

The problem here is that protoc is generating a message named FooEntry for the map field foo and that conflicts with your user-defined message FooEntry. protoc is not adding the map_entry option to your type based on the name.

Name mangling needs to be applied to the generated map entry type to allow a nested FooEntry type and a map<> foo to coexist.

jhump commented 2 years ago

@googleberg, that does not make sense. There is no conflict because they are in different namespaces. Map entry messages are generated as nested inside the message that declares the map field. The user-defined type is not a nested type, so there is no conflict. For example, this is perfectly valid -- the two Foo messages do not conflict:

syntax = "proto3";

message Foo {}

message Container {
  message Foo {}
  repeated Foo foo = 1;
}

Also the error message tells the user not to use the map_entry option explicitly. But it is not used explicitly. Which implies it's being set somewhere else:

test.proto: map_entry should not be set explicitly. Use map<KeyType, ValueType> instead.

In the example above, the user did use the suggested map<KeyType, ValueType> formulation.

jhump commented 2 years ago

Okay, I see the problem. Indeed, no option is being set based on the name.

But it's also not due to a conflict. The messages (user-defined one and synthetic one to represent map entry) co-exist just fine.

The issue is that when the reference to FooEntry in the map type is resolved, it is resolving to the generated message, not the user-defined one.,

So simply changing the reference to explicitly indicate the top-level message fixes this. So the following works:

syntax = "proto3";

message FooEntry {}

message Container {
    map<uint32, .FooEntry> foo = 1;
}

@ergl, take a look at how to resolve in your proto file ^^

In case it's not obvious, the change in the above vs. the original example is force a fully-qualified reference via a leading dot:

7c7
<   map<string, FooEntry> foo = 1;
---
>   map<string, .FooEntry> foo = 1;
googleberg commented 2 years ago

@jhump my apologies for not carefully reading the initial example.

You are entirely correct that the two message types are in different namespaces.

The problem is that the lookup is occurring from the nested scope. So the first candidate for AEntry that is found is the generated map entry type.

This can be solved by adding package qualifiers on the value type in the map field declaration.

For the example above add a '.' to get the global namespace message AEntry like this:

syntax = "proto3"; // also happens on proto2

message AEntry {}

message Container {
    map<uint32, .AEntry> a = 1;
}

Or, in an example with a declared package:

syntax = "proto3"; // also happens on proto2

package pkg;

message AEntry {}

message Container {
    map<uint32, pkg.AEntry> a = 1;
}

I would argue that this is very subtle and users shouldn't have to know about map implementation details to fix this issue. Name mangling would solve the original situation as well as the example given by @elharo (which has no namespacing workaround).

jhump commented 2 years ago

the example given by @elharo (which has no namespacing workaround).

@googleberg, were you referring to this example? https://github.com/protocolbuffers/protobuf/issues/9520#issuecomment-1175080496

While that is a little annoying, at least the error message is comprehensible. I think the more egregious issue with the original example is that the error message is confusing and misleading (incorrect even).

Name mangling, like as is done with synthetic oneofs for proto3 optional, would indeed help. However, it isn't perfect since it would still be possible to name a user-defined type the same thing as the mangled name (even if highly unlikely). A more robust fix would be to alter the symbol resolution logic (when resolving relative references) to ignore synthetic map entry messages. A possible solution might be to use a mangled name with characters not otherwise allowed in an identifier. But this will mostly likely cause issues with runtimes that validate descriptors during initialization: when the embedded descriptors are initialized (such as to create "rich" descriptors around embedded descriptor proto data), they could choke on a name that appears to be invalid.

profcoconut commented 11 months ago

I have encountered the same error. As a workaround, change MappingEntry to something like MappingE etc.

github-actions[bot] commented 7 months ago

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago.

elharo commented 7 months ago

Unless someone has fixed this, it should remain active. legitimate bug reports shouldn't be closed because the team is overloaded.

github-actions[bot] commented 4 months ago

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago.

ergl commented 4 months ago

This is still an issue

github-actions[bot] commented 1 month ago

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago. This issue will be closed and archived after 14 additional days without activity.

ergl commented 1 month ago

This is still an issue

Logofile commented 1 month ago

I've sent a documentation update out for review. I don't think that it will fully resolve the issue, but at least the behavior will be documented.