protocolbuffers / protobuf

Protocol Buffers - Google's data interchange format
http://protobuf.dev
Other
65.17k stars 15.44k forks source link

C++ public enum for field numbers instead of anonymous enum #17760

Open eronnen opened 1 month ago

eronnen commented 1 month ago

What language does this apply to? C++

Describe the problem you are trying to solve.

I would like to have a Public enum for all the field numbers for every message. This will help programmers to ensure they handle all possible fields in a message, by writing a switch statement over the field number and compiling the program with the -Wswitch flag.

Describe the solution you'd like

Instead of declaring field numbers in the generated c++ with anonymous enum like

enum : int {
 kFirstFieldNumber = 1,
 kSecondFieldNumber =2,
}

I suggest declaring a named enum that can be used by the programmer, like

enum MessageNameFieldNumber : int {
 kFirstFieldNumber = 1,
 kSecondFieldNumber =2,
}

Describe alternatives you've considered

Probably could be done by external plugin, but it seems much easier generating named enum from protoc itself and shouldn't interfere with existing API .

eronnen commented 1 month ago

Please let me know if it's acceptable and I'll submit a PR

shaod2 commented 1 month ago

Aren't we already doing that? e.g. https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/descriptor.pb.h#L164

eronnen commented 1 month ago

@shaod2 I'm referring to field numbers of general messages, not necessarily enum messages.

For example: https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/descriptor.pb.h#L9368

shaod2 commented 1 month ago

oic

can you give me a concrete example of such usages, and why is it superior to the refections e.g. "descriptor->field(i)"?

eronnen commented 1 month ago

the advantage of having a named enum is being able to conveniently having a switch statement over the fields. a concrete example with protobuf's addressbook.proto would be:

#include <iostream>

#include "addressbook.pb.h"

namespace named_enum {
    enum PersonFieldNumber : int {
      kPhonesFieldNumber = 4,
      kNameFieldNumber = 1,
      kEmailFieldNumber = 3,
      kLastUpdatedFieldNumber = 5,
      kIdFieldNumber = 2,
  };
}

int main(int argc, char* argv[]) {
    tutorial::Person person;
    person.set_name("John Doe");
    person.set_id(1234);
    person.set_email("johndoe@example.com");

    const google::protobuf::Reflection* reflection = person.GetReflection();
    const google::protobuf::Descriptor* descriptor = person.GetDescriptor();
    for (int i = 0; i < descriptor->field_count(); ++i) {
        const auto field = descriptor->field(i);
        const auto fieldNumber = static_cast<named_enum::PersonFieldNumber>(field->number());
        switch (fieldNumber) {
        case named_enum::PersonFieldNumber::kNameFieldNumber:
            std::cout << person.name() << std::endl;
            break;
        case named_enum::PersonFieldNumber::kIdFieldNumber:
            std::cout << person.id() << std::endl;
            break;
        case named_enum::PersonFieldNumber::kEmailFieldNumber:
            std::cout << person.email() << std::endl;
            break;
        case named_enum::PersonFieldNumber::kPhonesFieldNumber:
        case named_enum::PersonFieldNumber::kLastUpdatedFieldNumber:
            break;
        }
    }

    return 0;
}

having a named PersonFieldNumber enum helps to know which values need to be handled in the switch statement, as opposed to having a switch over a generic integer.

currently in the addressbook.pb.h file the field numbers are defined in an unamed enum like

enum : int {
    kPhonesFieldNumber = 4,
    kNameFieldNumber = 1,
    kEmailFieldNumber = 3,
    kLastUpdatedFieldNumber = 5,
    kIdFieldNumber = 2,
  };

and it would help the example above if it was defined with a name like:

enum PersonFieldNumber : int {
    kPhonesFieldNumber = 4,
    kNameFieldNumber = 1,
    kEmailFieldNumber = 3,
    kLastUpdatedFieldNumber = 5,
    kIdFieldNumber = 2,
  };
acozzette commented 1 day ago

Sorry but I think we probably would not want to add this. Although it seems like it could be useful in some cases, we have a pretty high bar for adding new public APIs, since we end up having to maintain them indefinitely. A good alternative would be to write a test that iterates over the message descriptor and exercises the code for all fields in it to check that each field is handled correctly. You could also write your own enum and just add a test to verify that it stays up to date.