protocolbuffers / protobuf

Protocol Buffers - Google's data interchange format
http://protobuf.dev
Other
65.84k stars 15.52k forks source link

Publishing `protoc` binaries with Pip package #17188

Open RobertoRoos opened 5 months ago

RobertoRoos commented 5 months ago

What language does this apply to?

Python

Describe the problem you are trying to solve.

Installing the protobuf compiler (protoc) on Windows doesn't have a fixed procedure, not like e.g. apt install on Linux. So if I share a Python project with a protobuf, where the user needs to generate protobuf files based on the definition, that user still needs to figure out their protoc installation.

But protobuf does have a nice Pip package, and pip packages could very well include platform specific binaries.

Describe the solution you'd like

I would suggest to either:

  1. Include protoc in the protobuf Pip package.
  2. Publish a second Pip package that only contains protoc, without actual Python code.

The downside of the first is you might conflict with the user's system install. With the second you make the Pip installed binary optional, which might be more convenient.

Describe alternatives you've considered

N.a.

Additional context

If such a Pip package exists, a Python project might contain the following in its pyproject.toml:

[tool.poetry.build]
script = "build.py"

[build-system]
requires = ["poetry-core", "protobuf-protoc"]
build-backend = "poetry.core.masonry.api"

Where build.py is a script to generate protobuf files, based on protoc. protoc is then installed only during the build step of this package. The used version would be pinned to the package specification and otherwise isolated to the Python environment being used. Moreover, using it in Python would require no action outside Python and Pip.

As a result it would work like:

python -m venv .venv
.venv/Scritps/Activate.ps1
(.venv) pip install protobuf
(.venv) protoc  # Error!
(.venv) pip install protobuf-protoc
(.venv) protoc  # Binary exists in .venv/Scripts/protoc.exe
RobertoRoos commented 5 months ago

I see there are a bunch of unofficial attempts at this already:

The main problem is these have no affiliation with the protobuf project, so they are not maintained and importantly, there is no guarantee they packaged the legit binaries.

RobertoRoos commented 5 months ago

In the meantime I've created a package that will download the official protoc during it's own installation: https://github.com/RobertoRoos/protobuf-protoc-exe

RobertoRoos commented 3 months ago

I've expanded this project some in the meantime: https://github.com/RobertoRoos/protobuf-protoc-exe

I've migrated it to setuptools for more control and correct wheels are now made too. Currently you could install the package from Git directly (pip install "git+https://github.com/RobertoRoos/protobuf-protoc-exe.git"), which will download the official release during installation. No binaries are hosted in the repo itself.

As platform-specific wheels are generated, those could easily be hosted on PyPi for faster download too.

I think this could be a good enough basis for an officially hosted Pip package.

tonyliaoss commented 2 months ago

Hi Robert,

You have hit on one of the pain points in our existing release. We agree that it's a frustrating experience that when you need protoc, you'd have to compile it from source instead of having an easy install command.

FWIW, gRPC releases protoc as part of grpcio-tools. But it is weird that gRPC ends up distributing protoc binaries, whereas it's more natural for us to own this.

This sounds like something we want to do, in principle.

There are some additional considerations here:

  1. Size
    • The python runtime is something around 0.5 MB in size right now.
    • Protoc itself is somewhere around 3MB zipped; 9.4MB unzipped. This would 20x the size of the installation, although probably this won't be a dealbreaker to most people.
  2. Not all python usecases would necessarily need protoc.
    • protoc is important for development, but for actually distributing packages, if the gencode is stable and in (say) pip, users won't need protoc to regenerate code every time.

In an ideal world, we would like to distribute two packages for python: one (e.g. protobuf) for the protobuf runtime, and one (e.g. protobuf-dev) for development-related utils that can include protoc.

This feels like a longer term project for us.

I think a first step would be to bundle the protoc binary with our python release, and then eventually refactor a bunch of stuff to make it possible to split this out into two distributions available on pip.

Another consideration for us, while not directly related to protoc packaging, is how to manage support and rolling upgrades. Let me try to summarize our thoughts here too:

tonyliaoss commented 2 months ago

I also see that you are using setuptools already. We do have a way to integrate with that -- it's described here:

https://github.com/protocolbuffers/protobuf/tree/main/python/protobuf_distutils

We'll probably want to use that in our "official" packaging for protoc. We don't have any ETA for this though.

RobertoRoos commented 2 months ago

Thanks @tonyliaoss for your message.

Not all python usecases would necessarily need protoc.

Yeah, I would suggest to create a separate Python package just for the binary, like I posted in my example. Using package requirements it's easy enough to include both or make one require the other if necessary. Using extras in pyproject.toml or Poetry's optional install groups it's easily to put the protoc compiler only in optional development tools.

We do have a way to integrate with that -- it's described here

That's neat, I'll give that a try in my project too.

Let me know if there is anything I can do to help out in this regard!

RobertoRoos commented 1 month ago

I hope the protobuf team won't mind, I went ahead and published my own protoc binaries package in the meantime because I need it to continue my work. I renamed the repo I posted above to https://github.com/RobertoRoos/protobuf-protoc-bin and it's published here: https://pypi.org/project/protobuf-protoc-bin/

Release are made by downloading binaries directly from https://github.com/protocolbuffers/protobuf/releases . Doing a source-install with protobuf-protoc-bin on a PC will download the protocolbuffer release from Github directly.

Just let me know if I should transfer the Github repo or the PyPi package. Or when you've published your own version and I can delete my own.

olupton commented 1 week ago

Is it possible to encode in the protobuf-protoc-bin wheel dependencies which versions of the protobuf runtime it supports, so that if an old protobuf version is pinned then a compatible protoc will be pulled in?

RobertoRoos commented 1 week ago

Is it possible to encode in the protobuf-protoc-bin wheel dependencies which versions of the protobuf runtime it supports, so that if an old protobuf version is pinned then a compatible protoc will be pulled in?

Let's continue that here: https://github.com/RobertoRoos/protobuf-protoc-bin/issues/15