stripe / skycfg

Skycfg is an extension library for the Starlark language that adds support for constructing Protocol Buffer messages.
Apache License 2.0
643 stars 54 forks source link

Switching to Protobuf v2 breaks generation of K8S protos #86

Open hagmonk opened 3 years ago

hagmonk commented 3 years ago

Commit e60b2f4dbd8201bea9039597546e5c6ba0b1b432 switched to Go Protobuf v2, but this breaks the K8S example.

Reverting to f561f12ccefbd35b70efc8c7c377dace5511b1ff does not exhibit the same panic.

> go run ./main.go TEST IT ./nginx.cfg
panic: field k8s.io.api.core.v1.ContainerPort.name has invalid type: got string, want pointer

goroutine 1 [running]:
google.golang.org/protobuf/internal/impl.fieldInfoForScalar(0x2069a40, 0xc000195900, 0x1d31f8b, 0x4, 0x0, 0x0, 0x2069380, 0x1d526c0, 0x1d31f91, 0x36, ...)
        /Users/lburton/pkg/mod/google.golang.org/protobuf@v1.25.1-0.20201016220047-aa45c4675289/internal/impl/message_reflect_field.go:228 +0x7d7
google.golang.org/protobuf/internal/impl.(*MessageInfo).makeKnownFieldsFunc(0xc000478420, 0xffffffffffffffff, 0x0, 0x0, 0xffffffffffffffff, 0x0, 0x0, 0xffffffffffffffff, 0x0, 0x0, ...)
        /Users/lburton/pkg/mod/google.golang.org/protobuf@v1.25.1-0.20201016220047-aa45c4675289/internal/impl/message_reflect.go:74 +0x95d
google.golang.org/protobuf/internal/impl.(*MessageInfo).makeReflectFuncs(0xc000478420, 0x2069380, 0x1e3fa20, 0xffffffffffffffff, 0x0, 0x0, 0xffffffffffffffff, 0x0, 0x0, 0xffffffffffffffff, ...)
        /Users/lburton/pkg/mod/google.golang.org/protobuf@v1.25.1-0.20201016220047-aa45c4675289/internal/impl/message_reflect.go:42 +0x65
google.golang.org/protobuf/internal/impl.(*MessageInfo).initOnce(0xc000478420)
        /Users/lburton/pkg/mod/google.golang.org/protobuf@v1.25.1-0.20201016220047-aa45c4675289/internal/impl/message.go:91 +0x185
google.golang.org/protobuf/internal/impl.(*MessageInfo).init(...)
        /Users/lburton/pkg/mod/google.golang.org/protobuf@v1.25.1-0.20201016220047-aa45c4675289/internal/impl/message.go:73
google.golang.org/protobuf/internal/impl.(*messageReflectWrapper).ProtoMethods(0xc00033b2c0, 0xc000309a80)
        /Users/lburton/pkg/mod/google.golang.org/protobuf@v1.25.1-0.20201016220047-aa45c4675289/internal/impl/message_reflect_gen.go:150 +0x53
google.golang.org/protobuf/proto.protoMethods(...)
        /Users/lburton/pkg/mod/google.golang.org/protobuf@v1.25.1-0.20201016220047-aa45c4675289/proto/proto_methods.go:18
google.golang.org/protobuf/proto.mergeOptions.mergeMessage(0x2063d60, 0xc00033b2c0, 0x2063d60, 0xc00033b2b0)
        /Users/lburton/pkg/mod/google.golang.org/protobuf@v1.25.1-0.20201016220047-aa45c4675289/proto/merge.go:67 +0x3b
google.golang.org/protobuf/proto.Clone(0x20250a0, 0xc00033b2b0, 0x20250a0, 0xc00033b2b0)
        /Users/lburton/pkg/mod/google.golang.org/protobuf@v1.25.1-0.20201016220047-aa45c4675289/proto/merge.go:58 +0xad
github.com/golang/protobuf/proto.Clone(0x74d19f0, 0xc0002d9300, 0x0, 0x0)
        /Users/lburton/pkg/mod/github.com/golang/protobuf@v1.4.1/proto/proto.go:130 +0x5d
github.com/stripe/skycfg/internal/go/skycfg.(*skyProtoMessageType).CallInternal(0xc0002d9340, 0xc0003e1bc0, 0x0, 0x0, 0x0, 0xc0001287a0, 0x1, 0x1, 0x0, 0xc000080900, ...)
        /Users/lburton/src/github.com/stripe/skycfg/internal/go/skycfg/proto_message_type.go:155 +0xdb
go.starlark.net/starlark.Call(0xc0003e1bc0, 0x20576c0, 0xc0002d9340, 0x0, 0x0, 0x0, 0xc0001287a0, 0x1, 0x1, 0x0, ...)
        /Users/lburton/pkg/mod/go.starlark.net@v0.0.0-20201204201740-42d4f566359b/starlark/eval.go:1191 +0x1a5
go.starlark.net/starlark.(*Function).CallInternal(0xc000400bc0, 0xc0003e1bc0, 0xc000197810, 0x1, 0x5, 0x0, 0x0, 0x0, 0xc0002bee00, 0x0, ...)
        /Users/lburton/pkg/mod/go.starlark.net@v0.0.0-20201204201740-42d4f566359b/starlark/interp.go:325 +0x3e8e
go.starlark.net/starlark.Call(0xc0003e1bc0, 0x20577c0, 0xc000400bc0, 0xc000197810, 0x1, 0x5, 0x0, 0x0, 0x0, 0x2057680, ...)
        /Users/lburton/pkg/mod/go.starlark.net@v0.0.0-20201204201740-42d4f566359b/starlark/eval.go:1191 +0x1a5
go.starlark.net/starlark.(*Function).CallInternal(0xc000400c00, 0xc0003e1bc0, 0xc0003f8680, 0x1, 0x1, 0x0, 0x0, 0x0, 0x1, 0x273c760, ...)
        /Users/lburton/pkg/mod/go.starlark.net@v0.0.0-20201204201740-42d4f566359b/starlark/interp.go:325 +0x3e8e
go.starlark.net/starlark.Call(0xc0003e1bc0, 0x20577c0, 0xc000400c00, 0xc0003f8680, 0x1, 0x1, 0x0, 0x0, 0x0, 0x0, ...)
        /Users/lburton/pkg/mod/go.starlark.net@v0.0.0-20201204201740-42d4f566359b/starlark/eval.go:1191 +0x1a5
go.starlark.net/starlark.(*Function).CallInternal(0xc000400c40, 0xc0003e1bc0, 0xc000402140, 0x1, 0x1, 0x0, 0x0, 0x0, 0x1dd1b60, 0x1e75740, ...)
        /Users/lburton/pkg/mod/go.starlark.net@v0.0.0-20201204201740-42d4f566359b/starlark/interp.go:325 +0x3e8e
go.starlark.net/starlark.Call(0xc0003e1bc0, 0x20577c0, 0xc000400c40, 0xc000402140, 0x1, 0x1, 0x0, 0x0, 0x0, 0x30, ...)
        /Users/lburton/pkg/mod/go.starlark.net@v0.0.0-20201204201740-42d4f566359b/starlark/eval.go:1191 +0x1a5
github.com/stripe/skycfg.(*Config).Main(0xc000400c80, 0x2052500, 0xc000130000, 0x0, 0x0, 0x0, 0x1, 0xc000400c80, 0x0, 0x0, ...)
        /Users/lburton/src/github.com/stripe/skycfg/skycfg.go:358 +0x3dd
main.main()
        /Users/lburton/src/github.com/stripe/skycfg/_examples/k8s/main.go:298 +0x565
exit status 2
jmillikin-stripe commented 3 years ago

This isn't something Skycfg can fix directly, but when the Protobuf module finishes migrating to the go-protobuf v2 API we should have some better options.

Summary of the issue:

Potential workarounds:

In the future, Skycfg will be switching from using generated Go structs to the dynamicpb package. This will remove any dependency on the Kubernetes structs, and it will be possible to operate on any Protobuf that a descriptor can be obtained for.

dilyevsky commented 3 years ago

Maybe proto.Clone from gogo package would still work here?

On Wed, Dec 9, 2020 at 6:40 PM John Millikin notifications@github.com wrote:

This isn't something Skycfg can fix directly, but when the Protobuf module finishes migrating to the go-protobuf v2 API we should have some better options.

Summary of the issue:

  • Kubernetes has checked in Go structs that resemble go-protobuf generated code, but which do not conform to the go-protobuf API.
  • Previous versions of go-protobuf happened to work with the Kubernetes structs because the particular API violations weren't relevant to that code path.
  • Newer versions of go-protobuf change how they do some reflection-based operations, which breaks on the Kubernetes structs.

Potential workarounds:

  • Avoid upgrading go-protobuf to v2 in your project -- this includes avoiding upgrades of Skycfg to a version that depends on go-protobuf v2.
  • Use a build system like Bazel to generate correct Go code for the Kubernetes protobufs, rather than rely on upstream's broken Go code.

In the future, Skycfg will be switching from using generated Go structs to the dynamicpb package. This will remove any dependency on the Kubernetes structs, and it will be possible to operate on any Protobuf that a descriptor can be obtained for.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stripe/skycfg/issues/86#issuecomment-742197647, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAF3KAD34BN3PF6WZ3VTHL3SUAYJRANCNFSM4UULNOFA .

hagmonk commented 3 years ago

Thanks for the quick reply! Makes sense!

Use a build system like Bazel to generate correct Go code for the Kubernetes protobufs, rather than rely on upstream's broken Go code.

I actually maintain a set of Bazel rules (that I'm hoping to have released OSS one day) called rules_skycfg :) The summary is that we generate all our K8S YAML using skycfg, and have those YAML generating rules as part of the Bazel dependency graph so we can reason about them the same as we do with source code.

One challenge I've had with generating protos for K8S is that gazelle is apparently not smart enough to connect proto imports across different repos. So, it generates a bunch of BUILD files that depend on //k8s.io/apimachinery/etc instead of @io_k8s_apimachinery//etc for instance. Hence I've been running with build_file_proto_mode = "disable_global" for @io_k8s_api and @io_k8s_apimachinery.

Is there any trick you can recommend to get proto generation working for K8S projects? The best idea I've found so far involves smothering my BUILD files with # gazelle:resolve directives for every single proto import path which seems like it will quickly get laborious … if I can get over this hurdle I can stay up-to-date with skycfg :)

hagmonk commented 3 years ago

FYI, Jay as recently as October recommended simply disabling proto generation, which makes me wonder if this is particularly feasible … https://github.com/bazelbuild/bazel-gazelle/issues/924#issuecomment-702357953.

jmillikin-stripe commented 3 years ago

Take a look at the # gazelle:proto_import_prefix and # gazelle:proto_strip_import_prefix Gazelle directives, which control the proto_library attrs of the same name -- that functionality exists to support pretty much this exact use case (https://github.com/bazelbuild/bazel/issues/3867).

If you're using go_repository() then those directives can be placed into the build_directives parameter.

I currently expect Skycfg to finish migrating to the dynamicpb implementation by the end of December, at which point this should become simpler.

hagmonk commented 3 years ago

Awesome, thanks so much. If I can get it working I'll update this issue for posterity.

hagmonk commented 3 years ago

I think what I really want is that dynamicpb implementation – if I'm guessing correctly, that would mean skycfg could operate on straight proto files without having to depend on generated code? That would simplify things dramatically :)

I know this isn't strictly skycfg related, but for the sake of people who stumble across this ... I kind of have proto generation for K8S APIs working.

I spent some time fiddling with gazelle, and it doesn't seem like those directives above solve the immediate problem. If I check out github.com/k8s/api and run gazelle -go_prefix k8s.io/api, we end up with BUILD.bazel files that have targets like this:

proto_library(
    name = "v1_proto",
    srcs = ["generated.proto"],
    visibility = ["//visibility:public"],
    deps = [
        "//k8s.io/api/core/v1:v1_proto",
        "//k8s.io/apimachinery/pkg/api/resource:resource_proto",
        "//k8s.io/apimachinery/pkg/apis/meta/v1:v1_proto",
        "//k8s.io/apimachinery/pkg/runtime:runtime_proto",
        "//k8s.io/apimachinery/pkg/runtime/schema:schema_proto",
    ],
)

There is no such target //k8s.io/api/core/v1:v1_proto. What it should be is //api/core/v1:v1_proto. The go_library rules it generates don't suffer from this problem. It seems like the resolution of imports to targets is b0rked for some reason. I skimmed through gazelle's source (trying not to go down a rabbit hole here) and tried a few different config options to influence the resolution and nothing had any effect whatsoever, except setting an explicit gazelle:resolve directive.

So ...

for i in $(buildozer 'print deps' //...:%proto_library | tr -d '[]' | tr ' ' '\n' | sort | uniq); do
        proto=$(sed -e 's%//%%' -e 's/:.*$/\/generated.proto/' <<<$i)
        printf "gazelle:resolve proto %s %s\n" $proto $(sed -e 's%//k8s.io/api%@io_k8s_api%' -e 's%/%//%' <<<$i)
        printf "gazelle:resolve proto go %s %s\n" $proto $(sed -e 's%//k8s.io/api%@io_k8s_api%' -e 's%/%//%' -e "s/:\(.*\)_proto$/:\1_go_proto/" <<<$i)
done

The output of which I tossed into my deps.bzl file:

K8S_DIRECTIVES = [
    "gazelle:resolve proto k8s.io/api/authentication/v1/generated.proto @io_k8s_api//authentication/v1:v1_proto",
    "gazelle:resolve proto go k8s.io/api/authentication/v1/generated.proto @io_k8s_api//authentication/v1:v1_go_proto",
    "gazelle:resolve proto k8s.io/api/batch/v1/generated.proto @io_k8s_api//batch/v1:v1_proto",
    "gazelle:resolve proto go k8s.io/api/batch/v1/generated.proto @io_k8s_api//batch/v1:v1_go_proto",
    "gazelle:resolve proto k8s.io/api/core/v1/generated.proto @io_k8s_api//core/v1:v1_proto",
    "gazelle:resolve proto go k8s.io/api/core/v1/generated.proto @io_k8s_api//core/v1:v1_go_proto",
    "gazelle:resolve proto k8s.io/apimachinery/pkg/api/resource/generated.proto @io_k8s_apimachinery//pkg/api/resource:resource_proto",
    "gazelle:resolve proto go k8s.io/apimachinery/pkg/api/resource/generated.proto @io_k8s_apimachinery//pkg/api/resource:resource_go_proto",
    "gazelle:resolve proto k8s.io/apimachinery/pkg/apis/meta/v1/generated.proto @io_k8s_apimachinery//pkg/apis/meta/v1:v1_proto",
    "gazelle:resolve proto go k8s.io/apimachinery/pkg/apis/meta/v1/generated.proto @io_k8s_apimachinery//pkg/apis/meta/v1:v1_go_proto",
    "gazelle:resolve proto k8s.io/apimachinery/pkg/runtime/schema/generated.proto @io_k8s_apimachinery//pkg/runtime/schema:schema_proto",
    "gazelle:resolve proto go k8s.io/apimachinery/pkg/runtime/schema/generated.proto @io_k8s_apimachinery//pkg/runtime/schema:schema_go_proto",
    "gazelle:resolve proto k8s.io/apimachinery/pkg/runtime/generated.proto @io_k8s_apimachinery//pkg/runtime:runtime_proto",
    "gazelle:resolve proto go k8s.io/apimachinery/pkg/runtime/generated.proto @io_k8s_apimachinery//pkg/runtime:runtime_go_proto",
    "gazelle:resolve proto k8s.io/apimachinery/pkg/util/intstr/generated.proto @io_k8s_apimachinery//pkg/util/intstr:intstr_proto",
    "gazelle:resolve proto go k8s.io/apimachinery/pkg/util/intstr/generated.proto @io_k8s_apimachinery//pkg/util/intstr:intstr_go_proto",
]

At which point I did require the gazelle:proto_import_prefix:

go_repository(
    name = "io_k8s_api",
    build_directives = K8S_DIRECTIVES + ["gazelle:proto_import_prefix k8s.io/api"],  # keep
    importpath = "k8s.io/api",
    sum = "h1:ojgIGmjzUSwsug8H2yVCoueRAGy0IshvrtowuLycEQo=",
    version = "v0.0.0-20181027024800-9fcf73cc980b",
)

go_repository(
    name = "io_k8s_apimachinery",
    build_directives = K8S_DIRECTIVES + ["gazelle:proto_import_prefix k8s.io/apimachinery"],  # keep
    importpath = "k8s.io/apimachinery",
    sum = "h1:QV0MJn7lF87qcyC3Y+On2zKM62Erf99uoORoQbu7lag=",
    version = "v0.0.0-20181026144617-b7f9f1fa80ae",
)

And where this has finally brought me is this error:

external/io_k8s_apimachinery/pkg/util/intstr/intstr.go:41:6: IntOrString redeclared in this block
        previous declaration at external/io_k8s_apimachinery/pkg/util/intstr/intstr_go_proto_/k8s.io/apimachinery/pkg/util/intstr/generated.pb.go:28:6

Which seems to be the result of somehow getting @io_k8s_apimachinery//pkg/util/intstr:intstr and @io_k8s_apimachinery//pkg/util/intstr:intstr_go_proto both in my dependency graph. I'm guessing I probably need another rewrite rule to make things depend only on the intstr_go_proto version. But who knows what dreadful hacks depend on that Go file, lol.

I'll keep poking at this, but hopefully this is useful to someone out there.

hagmonk commented 3 years ago

Two more resolve overrides fixes things up:

"gazelle:resolve go k8s.io/apimachinery/pkg/util/intstr @io_k8s_apimachinery//pkg/util/intstr:intstr_go_proto",
"gazelle:resolve go k8s.io/apimachinery/pkg/api/resource @io_k8s_apimachinery//pkg/api/resource:resource_go_proto",

Then, you have to be careful that your BUILD.bazel file has directives to make sure you link against the Bazel-generated Go libraries, not the checked-in ones:

# gazelle:resolve go k8s.io/api/apps/v1 @io_k8s_api//apps/v1:v1_go_proto
# gazelle:resolve go k8s.io/api/batch/v1 @io_k8s_api//batch/v1:v1_go_proto
# gazelle:resolve go k8s.io/api/core/v1 @io_k8s_api//core/v1:v1_go_proto
# gazelle:resolve go k8s.io/api/storage/v1 @io_k8s_api//storage/v1:v1_go_proto
# gazelle:resolve go k8s.io/apimachinery/pkg/apis/meta/v1 @io_k8s_apimachinery//pkg/apis/meta/v1:v1_go_proto
# gazelle:resolve go k8s.io/apimachinery/pkg/runtime @io_k8s_apimachinery//pkg/runtime:runtime_go_proto

So that gets you something. But it turns out the checked-in Go source does more for you than just exist. This skycfg program will now fail with can't assign to .name field of NoneType

def main(ctx):
    d = appsv1.Deployment()
    d.metadata.name = "hey"
    return [d]

Basically it looks like you need to construct every single object?

def main(ctx):
    d = appsv1.Deployment()
    m = metav1.ObjectMeta()
    m.name = "hey"
    d.metadata = m
    return [d]

With several thousand lines of skycfg I'm not super keen on having to rewrite most of it with explicit constructors ... will this also be an issue after switching to reflection?

Another problem is I can no longer import k8s.io.apimachinery/pkg/runtime due to a conflict with the generated proto. I depend on for outputting human-readable YAML to make it easier for users during development and debug. I could write my own serdes to solve this but it's convenient to have kubectl output parity. But in general, if you pull in k8s/client-go, or want to use schema.GroupVersionResource you're immediately going to hit the problem. It might be possible to separate the Bazel-generated proto into its own importpath / package? I just doubt that such objects will be accepted by all the existing tooling.

Well that was a fun exercise. I think for the time being I'll stick with the earlier skycfg versions without the v2 proto dependency. I'm totally at the limit of my proto knowledge here, so I look forward to learning what I'm getting wrong ;)

jmillikin-stripe commented 3 years ago

Due to some unrelated circumstances, I do not think I'll be able to finish this work by the end of the year -- and there is now no ETA.

I can review PRs that finish migrating the proto module to go-protobuf v2, or alternatively, I can roll back the migration and return current HEAD to go-protobuf v1.

Do the followers of this issue have any preference as to the above choice?

dilyevsky commented 3 years ago

Hey John, hope everything is Ok. I should be able to burn some cycles on this comes q1 since we are stuck on older skycfg and proto v1.3.2 due to this as well. Could you post a rough outline on what remains to be done here? (I didn’t see any prs fly by so it seems like “everything”? =))

On Sat, Dec 19, 2020 at 7:52 PM John Millikin notifications@github.com wrote:

Due to some unrelated circumstances, I do not think I'll be able to finish this work by the end of the year -- and there is now no ETA.

I can review PRs that finish migrating the proto module to go-protobuf v2, or alternatively, I can roll back the migration and return current HEAD to go-protobuf v1.

Do the followers of this issue have any preference as to the above choice?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/stripe/skycfg/issues/86#issuecomment-748559264, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAF3KACA6SROPWVXBGH2UWDSVVYGZANCNFSM4UULNOFA .

jmillikin commented 3 years ago

I have a version of the proto module that I'd guess is around 40-50% done -- it handles packages, message types, and enums. Will try to push it to github.com so it can be built upon.

Messages themselves are the missing big piece, plus figuring out a solution to:

  1. The serialization of Kubernetes YAML, which ought to be backed by the k8s.io packages if they're available.
  2. The unexpected change to auto-vivified fields @hagmonk described above. The intended behavior is that fields do not auto-vivify (so that foo.bar == None works), but if Kubernetes' custom structs were previously bypassing that then we need at least an option for maintaining backwards compatibility.
jmillikin-stripe commented 3 years ago

The finished parts of the go-protobuf v2 implementation for message reflection have been pushed to trunk -- they're not wired up to anything, but do have tests based on the existing internal/go/proto_test.go test suite.

Placeholders for the unfinished part, the protoMessage implementation, have been pushed to a branch at https://github.com/stripe/skycfg/compare/jmillikin_protobuf-api-v2?expand=1

jmillikin-stripe commented 3 years ago

Compatibility note: I just tagged commit f561f12ccefbd35b70efc8c7c377dace5511b1ff (right before the go-protobuf v2 switch) as v0.1.0. This should allow existing users to run Go module updates without worrying about accidentally picking up the new dependency.

The next tag will happen after the migration has finished and Kubernetes protos have been verified to work with go-protobuf v2.

com6056 commented 1 year ago

Are there any updates on this issue with the new v2 improvements that have been merged since?

com6056 commented 1 year ago

Are there any updates on this issue with the new v2 improvements that have been merged since?

Ah I see now, for anyone else that is slow like me and still catching up on this, at this point we're just waiting for Kubernetes to migrate away from gogo so they support the new proto registry: https://github.com/kubernetes/kubernetes/issues/96564