Generate Python bindings?

woodruffw commented 1 year ago

Similar to the codegen'd Go that's already in the repo: would it be okay to generate and check-in the Python bindings? If so, I/we (ToB) can take a stab at it.

znewman01 commented 1 year ago

Is it idiomatic to check in the generated code here? We decided not to do it for Java.

My expectation is that we should avoid generated code checked-in where possible, and that in Python the code could be generated as part of the release process.

Will defer to language ecosystem experts here :)

woodruffw commented 1 year ago

Gotcha -- I saw that it was checked in for Go, although I have no idea what's idiomatic in terms of the Python protobuf ecosytem. I'll do a survey of the landscape and see what's normal.

woodruffw commented 1 year ago

(In the abstract, I'm generally in favor of checking in generated code -- it produces a lot of chaff in git, but it also makes it easier to track changes to the code generator itself and makes rolling back an instance of poor code generation a little easier. But these concerns may not apply to protobuf, since I understand it to be relatively mature.)

loosebazooka commented 1 year ago

My current thought is to add the java builder code in here to generate the java jar via gradle orchestration and then publish to maven central. I don't know if that translates to python though

woodruffw commented 1 year ago

I don't know if that translates to python though

I think the closest equivalent in Python would be a custom setup.py that invokes protoc to do the code generation. That would work, but I'm not a huge fan of it 😅 -- codegen via setup.py is definitely a bit of a "smell" in modern Python, and modern pure-Python projects should use pyproject.toml if possible instead of setup.py.

woodruffw commented 1 year ago

Unrelatedly, this looks like a nice protoc plugin to use for Python: https://github.com/danielgtaylor/python-betterproto -- looks like it takes advantage of newer Python (3.6+) features, and has a slightly more idiomatic consumer-side API.

znewman01 commented 1 year ago

To be clear, if you assert (without evidence, even) that "checking the generated Python protos in is idiomatic" I'd be happy to merge a PR here 😄 Just want to make sure we're not doing it without considering the other options.

and modern pure-Python projects should use pyproject.toml if possible instead of setup.py.

I assume there's some plan for pyproject.toml codegen? Not sure. If you figure that out, let this poor SO user know.

Unrelatedly, this looks like a nice protoc plugin to use for Python: https://github.com/danielgtaylor/python-betterproto -- looks like it takes advantage of newer Python (3.6+) features, and has a slightly more idiomatic consumer-side API.

I think our goal here is the best possible experience for downstream Python consumers; I'd be willing to use a nonstandard Python generator if that helps.

kommendorkapten commented 1 year ago

It would not make sense to build a python package and publish to PyPI? Which is kind of what @loosebazooka proposed I think.

znewman01 commented 1 year ago

It would not make sense to build a python package and publish to PyPI? Which is kind of what @loosebazooka proposed I think.

IIUC that's always going to be the end result. The question is whether the contents of that package are generated at build time or they're checked in

kommendorkapten commented 1 year ago

IIUC that's always going to be the end result. The question is whether the contents of that package are generated at build time or they're checked in

Ah, gotcha. While I'm no expert in the python canonical way of doing this, at my previous jobs when doing protobuf genertaion, the build job built and publish the package (for all builds!), and then when we cut a release we create a git tag (semver) and so published package can get the name from the tag to get correct versioning. For day-to-day builds I think we just used a constant "dev" version, don't think we kept any history.

woodruffw commented 1 year ago

Thanks for the datapoint! I've asked @tetsuo-cpp to look into best practices here, but I'm perfectly happy with the not-checked-in approach if that's common.

tetsuo-cpp commented 1 year ago

It definitely seems much more common than not to check-in the generated code. However, I just GitHub searched "protobuf" and filtered for Python projects, so I'm not exactly confident that it's a "best practice".

I'd say unless anyone has more concrete objections, let's just check-in the code. Either way, I think we should definitely avoid generating is as part of the build in setup.py.

woodruffw commented 1 year ago

That works for me!

As an aside, I thought of another reason why checking in the generated code is potentially nice: if we ever sign for this generated Python package, the thing we'll be signing for (the generated code) will match the repository state (as attested by the GitHub workflow sha, etc. claims).

sigstore / protobuf-specs

Generate Python bindings? #20