Closed woodruffw closed 1 year ago
Is it idiomatic to check in the generated code here? We decided not to do it for Java.
My expectation is that we should avoid generated code checked-in where possible, and that in Python the code could be generated as part of the release process.
Will defer to language ecosystem experts here :)
Gotcha -- I saw that it was checked in for Go, although I have no idea what's idiomatic in terms of the Python protobuf ecosytem. I'll do a survey of the landscape and see what's normal.
(In the abstract, I'm generally in favor of checking in generated code -- it produces a lot of chaff in git
, but it also makes it easier to track changes to the code generator itself and makes rolling back an instance of poor code generation a little easier. But these concerns may not apply to protobuf, since I understand it to be relatively mature.)
My current thought is to add the java builder code in here to generate the java jar via gradle orchestration and then publish to maven central. I don't know if that translates to python though
I don't know if that translates to python though
I think the closest equivalent in Python would be a custom setup.py
that invokes protoc
to do the code generation. That would work, but I'm not a huge fan of it 😅 -- codegen via setup.py
is definitely a bit of a "smell" in modern Python, and modern pure-Python projects should use pyproject.toml
if possible instead of setup.py
.
Unrelatedly, this looks like a nice protoc
plugin to use for Python: https://github.com/danielgtaylor/python-betterproto -- looks like it takes advantage of newer Python (3.6+) features, and has a slightly more idiomatic consumer-side API.
To be clear, if you assert (without evidence, even) that "checking the generated Python protos in is idiomatic" I'd be happy to merge a PR here 😄 Just want to make sure we're not doing it without considering the other options.
and modern pure-Python projects should use pyproject.toml if possible instead of setup.py.
I assume there's some plan for pyproject.toml codegen? Not sure. If you figure that out, let this poor SO user know.
Unrelatedly, this looks like a nice
protoc
plugin to use for Python: https://github.com/danielgtaylor/python-betterproto -- looks like it takes advantage of newer Python (3.6+) features, and has a slightly more idiomatic consumer-side API.
I think our goal here is the best possible experience for downstream Python consumers; I'd be willing to use a nonstandard Python generator if that helps.
It would not make sense to build a python package and publish to PyPI? Which is kind of what @loosebazooka proposed I think.
It would not make sense to build a python package and publish to PyPI? Which is kind of what @loosebazooka proposed I think.
IIUC that's always going to be the end result. The question is whether the contents of that package are generated at build time or they're checked in
IIUC that's always going to be the end result. The question is whether the contents of that package are generated at build time or they're checked in
Ah, gotcha. While I'm no expert in the python canonical way of doing this, at my previous jobs when doing protobuf genertaion, the build job built and publish the package (for all builds!), and then when we cut a release we create a git tag (semver) and so published package can get the name from the tag to get correct versioning. For day-to-day builds I think we just used a constant "dev" version, don't think we kept any history.
Thanks for the datapoint! I've asked @tetsuo-cpp to look into best practices here, but I'm perfectly happy with the not-checked-in approach if that's common.
It definitely seems much more common than not to check-in the generated code. However, I just GitHub searched "protobuf" and filtered for Python projects, so I'm not exactly confident that it's a "best practice".
I'd say unless anyone has more concrete objections, let's just check-in the code. Either way, I think we should definitely avoid generating is as part of the build in setup.py
.
That works for me!
As an aside, I thought of another reason why checking in the generated code is potentially nice: if we ever sign for this generated Python package, the thing we'll be signing for (the generated code) will match the repository state (as attested by the GitHub workflow sha
, etc. claims).
Similar to the codegen'd Go that's already in the repo: would it be okay to generate and check-in the Python bindings? If so, I/we (ToB) can take a stab at it.