Open schmichael opened 3 years ago
So I tried to figure out what might be happening here from source changes, but what makes that particularly challenging is that buf
is not in any way developed in the open. The commit history is mostly just them syncing back and forth from their internal repo with giant squashed changes.
I'm going to propose that we remove buf
and go back to using protoc
directly. The entire value proposition for buf
is that it's supposed to help us avoid backwards incompatible changes, yet they broke our build and don't make it easy for us to figure out what's gone wrong with it.
After an internal discussion we decided to hold off on tearing it out until we could at least file an issue with upstream, and it turns out that it's apparently not a simple round-tripping problem with buf
. It looks like the gRPC library is in the mix here... still trying to build a minimal repro.
Unfortunately I was able to eliminate gRPC from the mix and the problem can be reproduced entirely with buf
alone. I've built a minimal reproduction here https://github.com/tgross/bufbreaks that generates a protobuf with one version and reads it in with another.
I've opened https://github.com/bufbuild/buf/issues/315 with upstream to get their thoughts.
$ make test
go build -trimpath -tags old -o bin/old .
go build -trimpath -tags new -o bin/new .
./bin/old write
./bin/new read
panic: protobuf tag not enough fields in Timestamp.state:
goroutine 1 [running]:
github.com/golang/protobuf/proto.(*unmarshalInfo).computeUnmarshalInfo(0xc0000ce460)
github.com/golang/protobuf@v1.5.0/proto/table_unmarshal.go:332 +0x1777
github.com/golang/protobuf/proto.(*unmarshalInfo).unmarshal(0xc0000ce460, 0xc0000b2240, 0xc0000ee00d, 0xb, 0x1f3, 0xc0000ee004, 0x7)
github.com/golang/protobuf@v1.5.0/proto/table_unmarshal.go:136 +0xf45
github.com/golang/protobuf/proto.makeUnmarshalMessagePtr.func1(0xc0000ee00d, 0xb, 0x1f4, 0xc0000b2210, 0x2, 0xc0000ee00b, 0xd, 0x1f5, 0x0, 0x0)
github.com/golang/protobuf@v1.5.0/proto/table_unmarshal.go:1646 +0x12b
github.com/golang/protobuf/proto.(*unmarshalInfo).unmarshal(0xc0000ce3c0, 0xc0000b2200, 0xc0000ee002, 0x16, 0x1fe, 0x0, 0x12aa320)
github.com/golang/protobuf@v1.5.0/proto/table_unmarshal.go:173 +0x88e
github.com/golang/protobuf/proto.makeUnmarshalMessagePtr.func1(0xc0000ee002, 0x16, 0x1ff, 0xc00009c870, 0x2, 0xc0000c5d68, 0x756ea1d8d90c87, 0x203000, 0x203000, 0x203000)
github.com/golang/protobuf@v1.5.0/proto/table_unmarshal.go:1646 +0x12b
github.com/golang/protobuf/proto.(*unmarshalInfo).unmarshal(0xc0000ce320, 0xc00009c870, 0xc0000ee000, 0x18, 0x200, 0x100a06c, 0x141c920)
github.com/golang/protobuf@v1.5.0/proto/table_unmarshal.go:173 +0x88e
github.com/golang/protobuf/proto.(*InternalMessageInfo).Unmarshal(0x13eea20, 0x12a7108, 0xc00009c870, 0xc0000ee000, 0x18, 0x200, 0x4a31901, 0x4a39058)
github.com/golang/protobuf@v1.5.0/proto/table_unmarshal.go:63 +0x66
main.(*TaskStatsResponse).XXX_Unmarshal(0xc00009c870, 0xc0000ee000, 0x18, 0x200, 0xc00009c870, 0xc0000c5e01)
bufbreaks/structs.new.pb.go:41 +0x65
github.com/golang/protobuf/proto.Unmarshal(0xc0000ee000, 0x18, 0x200, 0x12a7108, 0xc00009c870, 0xc0000c5f40, 0x11f8f7a)
github.com/golang/protobuf@v1.5.0/proto/decode.go:337 +0x1aa
main.deserialize(0xc0000ee000, 0x18, 0x200)
bufbreaks/serde.new.go:33 +0x85
main.main()
bufbreaks/main.go:34 +0xcb
make: *** [test] Error 2
Just to reiterate the points in https://github.com/bufbuild/buf/issues/315 here:
github.com/golang/protobuf
to google.golang.org/protobuf
.go_package
on the Well-Known Types recently to point to google.golang.org/protobuf
, which are expected to be built-in to the compiler. protoc
will also cause the same result (you can verify this yourself). Given that this is how the Protobuf ecosystem works, we have no choice but to match protoc
here. We're actually writing a blog post about this issue right now.github.com/golang/protobuf
serialization code in Nomad for new google.golang.org/protobuf
types, however I am not sure. We are sure that this bug is not a buf
bug, however.On the point of not being developed in the open, we'd argue that we are as much as possible, we have a backend monorepo for both our public and private code, and yes do sync it across, but obviously there's nuance.
Providing some more detail here from 315:
Nothing has changed about the the structure of google.protobuf.Timestamp, except Google did introduce a change to the go_package
value in v3.14.0, updating it to point to google.golang.org/protobuf/types/known/timestamppb
, which is a breaking change. This shouldn't affect your above code, however - this has nothing to do with serialization or the wire compatibility of Timestamp
.
Looking through https://github.com/tgross/bufbreaks/blob/trunk/serde.new.go, it appears you are using the old github.com/golang/protobuf/proto
with the new Timestamp
type, which could cause a problem, you should be using the new https://pkg.go.dev/google.golang.org/protobuf/proto#Marshal with new types.
As to buf
not breaking here, we actually agree - even though this is just changing the go_package
value, i.e. where the generated code for Timestamp
is expected to be, we detect this as a breaking change:
buf breaking --against v3.13.0 v3.14.0
v3.14.0/google/protobuf/any.proto:36:1:File option "go_package" changed from "github.com/golang/protobuf/ptypes/any" to "google.golang.org/protobuf/types/known/anypb".
v3.14.0/google/protobuf/api.proto:43:1:File option "go_package" changed from "google.golang.org/genproto/protobuf/api;api" to "google.golang.org/protobuf/types/known/apipb".
v3.14.0/google/protobuf/compiler/plugin.proto:53:1:File option "go_package" changed from "github.com/golang/protobuf/protoc-gen-go/plugin;plugin_go" to "google.golang.org/protobuf/types/pluginpb".
v3.14.0/google/protobuf/descriptor.proto:44:1:File option "go_package" changed from "github.com/golang/protobuf/protoc-gen-go/descriptor;descriptor" to "google.golang.org/protobuf/types/descriptorpb".
v3.14.0/google/protobuf/duration.proto:37:1:File option "go_package" changed from "github.com/golang/protobuf/ptypes/duration" to "google.golang.org/protobuf/types/known/durationpb".
v3.14.0/google/protobuf/empty.proto:36:1:File option "go_package" changed from "github.com/golang/protobuf/ptypes/empty" to "google.golang.org/protobuf/types/known/emptypb".
v3.14.0/google/protobuf/field_mask.proto:40:1:File option "go_package" changed from "google.golang.org/genproto/protobuf/field_mask;field_mask" to "google.golang.org/protobuf/types/known/fieldmaskpb".
v3.14.0/google/protobuf/source_context.proto:40:1:File option "go_package" changed from "google.golang.org/genproto/protobuf/source_context;source_context" to "google.golang.org/protobuf/types/known/sourcecontextpb".
v3.14.0/google/protobuf/struct.proto:37:1:File option "go_package" changed from "github.com/golang/protobuf/ptypes/struct;structpb" to "google.golang.org/protobuf/types/known/structpb".
v3.14.0/google/protobuf/timestamp.proto:37:1:File option "go_package" changed from "github.com/golang/protobuf/ptypes/timestamp" to "google.golang.org/protobuf/types/known/timestamppb".
v3.14.0/google/protobuf/type.proto:44:1:File option "go_package" changed from "google.golang.org/genproto/protobuf/ptype;ptype" to "google.golang.org/protobuf/types/known/typepb".
v3.14.0/google/protobuf/wrappers.proto:47:1:File option "go_package" changed from "github.com/golang/protobuf/ptypes/wrappers" to "google.golang.org/protobuf/types/known/wrapperspb".
We don't think a change should have been made to the Well-Known Types, ever, but are hands of kind of tied here. The Protobuf ecosystem relies on the WKTs being built-in to the compiler (although protoc
relies on them being at ../include
relative to the binary, which is even more broken), and Google made this upgrade. If there was a wire incompatibility (which there is not here, your problem is something else), and you use protoc
in a similar way, it will break as well. But given that users use the WKTs extensively, and not upgrading would be even more of a "breaking" change for the majority of users given how the Protobuf ecosystem works, we have no choice but to choose the lesser of two evils.
One way to solve this, if you are concerned with WKT breakages, is to vendor the WKTs yourself:
# This is actually the best way, not kidding
$ mkdir -p tmp && \
cd ./tmp && \
curl -sSL https://github.com/protocolbuffers/protobuf/releases/download/v3.15.8/protoc-3.15.8-linux-x86_64.zip -o protoc.zip && \
unzip protoc.zip >/dev/null && \
mv ./include/google ~/your/repo/third_party/proto/google
And then add third_party/proto
as a root:
# buf.yaml
version: v1beta1
build:
roots:
- third_party/proto
If you have the WKTs included yourself, both buf
and protoc
will ignore their builtins and use these instead.
Of note though, WKTs have never had a wire-incompatible change in over a decade.
I'm trying to hunt down your issue, and one thing is that you are not versioning protoc-gen-go
in https://github.com/tgross/bufbreaks/blob/trunk/Makefile - I am assuming you are using the old protoc-gen-go
?
I've posted an overview of the problem at https://github.com/tgross/bufbreaks/pull/1
Another option at https://github.com/tgross/bufbreaks/pull/2
I'm not sure the changed go_package
option is necessarily the root cause. At least the stack trace https://github.com/hashicorp/nomad/issues/10293#issuecomment-832801234 suggest that incompatible versions due to vendoring.
I suspect the following is happening:
buf
is generating .pb.go
files with protoc-gen-go
at v1.20.x or later.protoc-gen-go
at v1.20.x or later must be used in conjunction with github.com/golang/protobuf
at v1.4.x or later and google.golang.org/protobuf
at v1.20.x or later.github.com/golang/protobuf
at v1.3.x or older is being used since the table_unmarshal.go
file no longer exists in newer versions.The fix seems to be upgrade the vendored version of github.com/golang/protobuf
to v1.4.x or later.
I should further note that this breakage isn't the fault of buf
as it can occur when using protoc-gen-go
directly.
Thanks @bufdev I think we can take it from here.
@schmichael I'm going to punt on this for now as we obviously don't need to make any changes for 1.1.0. But there's also a small pile of developer documentation issues this implies around third-party task driver plugins. Just because the wire protocol is compatible doesn't mean that folks trying to reuse marshal/unmarshal code from the exec
task driver (which is a typical starting point for folks despite the existence of https://github.com/hashicorp/nomad-skeleton-driver-plugin) aren't going to have a bad time if they aren't using the exact same set of dependencies.
Nomad version
Nomad v1.1.0-dev
Issue
Using the wrong version of buf to compile protobufs causes an agent panic.
As of Nomad v1.1.0-dev we need to stick to buf ~v0.30.1~ v0.36.0 which is now installed by
make bootstrap
.Reproduction steps
Expected Result
It works. Our wire protocol should always be forward/backward compatible.
Actual Result
The Nomad agent panics with:
Generating protos with buf v0.41.0