nipunn1313 / mypy-protobuf

open source tools to generate mypy stubs from protobufs
Apache License 2.0
648 stars 80 forks source link

Question: why not use Python’s Enum #484

Open jenstroeger opened 1 year ago

jenstroeger commented 1 year ago

I keep playing with protobuf’s enumerations which are encoded as 32b integers.

When compiled to Python and using this plugin, protobuf enums are represented using an int subtype EnumTypeWrapper to implement a stronger typed int of sorts.

What was the reasoning behind this decision vs. using e.g. IntEnum?

Also, suppose I have two or more enums and compile them to Python. I now would like to have a function

def foo(pb2_enum_type: type[EnumTypeWrapper]):
    ...

which takes any one of the enums, how would I type that function’s arg? I understood that metaclasses are handled ok by mypy but the above case fails with

Argument 1 to "foo" has incompatible type "Type[MyEnum]"; expected "Type[EnumTypeWrapper]"
nipunn1313 commented 1 year ago

Hi - thanks for the interest!

The mypy types are trying to correctly represent what protobuf actually presents. Protobuf's python generator does not actually produce an IntEnum, but rather represents int.

If we typed it as IntEnum - then the typechecker would (incorrectly) assume that it has all the powers of an IntEnum, which it does not. I think this would lead to unexpected behavior when your autocompletions suggest IntEnum methods that do not work.

If you prefer IntEnum, you can write up your own IntEnum and take the int from protobuf and decode it into your custom IntEnum class.

jenstroeger commented 1 year ago

Thanks @nipunn1313, that makes sense. I’ve been using Python Enums but the extra layer of mapping from protobuf int to Python Enum turned out to be bothersome — so I’m looking for a more direct way.

Any thoughts on the second question about typing the function arg?

nipunn1313 commented 1 year ago

Hi. I think EnumTypeWrapper may not be what you think it is. It is not a subclass of int.

enums values in protobuf are represented as ints. mypy-protobuf chooses to represent these as a newtype (https://github.com/nipunn1313/mypy-protobuf#types-enum-int-values-more-strongly). Eg MyEnum.ValueType.

The enum class itself is a subclass of EnumTypeWrapper - which is a wrapper on the type that comes with some convenience methods (https://github.com/protocolbuffers/protobuf/blob/main/python/google/protobuf/internal/enum_type_wrapper.py).

There is good documentation on this here https://developers.google.com/protocol-buffers/docs/reference/python-generated#enum. I find the choice of behavior confusing, but this is how protobuf works. Play around with it a bit and see if you might be able to come up with something.

For your specific question, you might want your function to take in Union[MyEnum1.ValueType, MyEnum2.ValueType, MyEnum3.ValueType]. If you don't have a finite well understood set of enums, you would want to take in an int which should work.

jenstroeger commented 1 year ago

Hi. I think EnumTypeWrapper may not be what you think it is. It is not a subclass of int.

Sorry, I misread the Generic[int] use in the type annotation here. Thanks for clarifying!

There is good documentation on this here https://developers.google.com/protocol-buffers/docs/reference/python-generated#enum. I find the choice of behavior confusing, but this is how protobuf works. Play around with it a bit and see if you might be able to come up with something.

Oh nice, I’ve not seen that bit of docs yet 👍🏼

For your specific question, you might want your function to take in Union[MyEnum1.ValueType, MyEnum2.ValueType, MyEnum3.ValueType]. If you don't have a finite well understood set of enums, you would want to take in an int which should work.

Actually, the function is supposed to take the enum type itself, not a specific enum value. For example:

enum MyEnum1 { ... }
enum MyEnum2 { ... }
enum MyEnum3 { ... }

then the function should look like this:

# Here I’d like to narrow Any to “protobuf enums”. That’s why I tried using the
# metaclass type from which all the enums are built, but that didn’t work 🤔
def f(pb2_enum_type: Any):
    print(pb2_enum_type.items())

f(MyEnum1)  # Prints all enum values for MyEnum1 (or any other).
nipunn1313 commented 1 year ago

Here's an example https://github.com/nipunn1313/mypy-protobuf/blob/main/test/generated/testproto/test3_pb2.pyi#L22

OuterEnum inherits from EnumTypeWrapper[OuterEnum.ValueType] (metaclass required for the scoping gymnastics for ValueType).

in your example, I think you want this:

def foo(pb2_enum_type: EnumTypeWrapper):
nipunn1313 commented 1 year ago

Another possibility would include a generic

T = TypeVar('T')
def foo(pb2_enum_type: _EnumTypeWrapper[T]):
  # some code that works for a variety of different `T`
jenstroeger commented 1 year ago

in your example, I think you want this:

def foo(pb2_enum_type: EnumTypeWrapper):

Heh… that’s where it all started 😉 If I run this

from protos import foo_pb2  # Supplies MyEnum.
from google.protobuf.internal.enum_type_wrapper import EnumTypeWrapper

def f(pb2_enum_type: EnumTypeWrapper) -> None:
    print(pb2_enum_type.items())

f(foo_pb2.MyEnum)

then I get the expected result:

[('UNDEFINED', 0), ('FIRST_VALUE', 1), ('SECOND_VALUE', 2)]

However, mypy fails:

test.py:7:3: error: Argument 1 to "f" has incompatible type "Type[MyEnum]"; expected "EnumTypeWrapper"  [arg-type]

I tried EnumTypeWrapper and type[EnumTypeWrapper] (see initial question above), and type alone, expectedly, gives me

test.py:5:11: error: "type" has no attribute "items"  [attr-defined]

So, here I am 🤓

nipunn1313 commented 1 year ago

Aha thanks for walking me through the entire process up to where you are now.

Let me try to explain what protobuf autogenerates (it's a bit weird). Protobuf autogenerates an object MyEnum which is an instance of EnumTypeWrapper. EnumTypeWrapper comes with some class methods that take in ints (docs). For example def Name(self, number) returns a string given the number

Mypy protobuf wants to allow you to ensure that MyEnum.Name only takes in a MyEnum.ValueType. To do this, we generate code like this.

MyEnum is typed as an instance of _EnumTypeWrapper[MyEnum.Value].

In typeshed, EnumTypeWrapper is typed as a subclass of _EnumTypeWrapper[int] to be most general.

Hence I think in your code, in order for it to work, you would need to use _EnumTypeWrapper[V] as your common base type.

from protos import foo_pb2  # Supplies MyEnum.
from typing import TypeVar, TYPE_CHECKING

if TYPE_CHECKING:
  from google.protobuf.internal.enum_type_wrapper import _EnumTypeWrapper

_V = TypeVar("_V", bound=int)
def f(pb2_enum_type: _EnumTypeWrapper[_V]) -> None:
    print(pb2_enum_type.items())

f(foo_pb2.MyEnum)

We may actually be able to achieve the goal of having MyEnum be considered a subclass of EnumTypeWrapper itself, by marking _V as covariant in typeshed and using a type alias rather than subclassing for EnumTypeWrapper. But would require some fiddling and some testing. It's a good use case you bring up. https://mypy.readthedocs.io/en/stable/generics.html#variance-of-generic-types

I haven't tested the above - just typing it off the top of my head. Could take a closer look when I get some time in one of these upcoming weekends.

jenstroeger commented 1 year ago

Hence I think in your code, in order for it to work, you would need to use _EnumTypeWrapper[V] as your common base type.

@nipunn1313 I think your suggestion actually worked 🥳 One minor tweak though:

if TYPE_CHECKING:
    from google.protobuf.internal.enum_type_wrapper import _EnumTypeWrapper
else:
    _EnumTypeWrapper = Generic

to ensure that _EnumTypeWrapper is available when running the code.

We may actually be able to achieve the goal of having MyEnum be considered a subclass of EnumTypeWrapper itself, by marking _V as covariant in typeshed and using a type alias rather than subclassing for EnumTypeWrapper. But would require some fiddling and some testing. It's a good use case you bring up.

Do you think that’s a change that could roll out soon? I’m going to use the above suggestion for now, though.

nipunn1313 commented 1 year ago

In python3.10 and up, type annotations aren't executed at runtime. from __future__ import annotations gets you this behavior in earlier versions of python3 (3.7 and up I think)

I will give it a shot when I get the chance. I am not 100% sure if it will work (hard to wrap my head around conceptually without trying it out), but will try and see.

jenstroeger commented 1 year ago

In python3.10 and up, type annotations aren't executed at runtime. from __future__ import annotations gets you this behavior in earlier versions of python3 (3.7 and up I think)

I’m on Python 3.10 and according to the docs, The first type annotation must be enclosed in quotes, making it a “forward reference”, […]. Now it works 👍🏼

Alternatively, the __future__ import also worked: If from __future__ import annotations is used, annotations are not evaluated at function definition time. I understand that to be independent of the Python version?

I will give it a shot when I get the chance. I am not 100% sure if it will work (hard to wrap my head around conceptually without trying it out), but will try and see.

No problem. If you like I can give you more actual context, to help with the practical relevance…