Closed ctron closed 1 year ago
Which version of Kubernetes and which version of controller-runtime? That seems bizzare, as it would imply that corev1.SecretList
was associated with multiple groups.
Can you post a minimal reproducer?
FWIW, the following works fine for me:
package main
import (
"context"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/log/zap"
corev1 "k8s.io/api/core/v1"
_ "k8s.io/client-go/plugin/pkg/client/auth/gcp"
)
func main() {
ctrl.SetLogger(zap.Logger(true))
mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{})
if err != nil {
panic(err)
}
stopCh := ctrl.SetupSignalHandler()
go func() {
if err := mgr.Start(stopCh); err != nil {
panic(err)
}
}()
if started := mgr.GetCache().WaitForCacheSync(stopCh); !started {
panic("not all started")
}
client := mgr.GetClient()
var secrets corev1.SecretList
if err := client.List(context.TODO(), &secrets); err != nil {
panic(err)
}
ctrl.Log.Info("my cached secrets", "secrets", len(secrets.Items))
directClient := mgr.GetAPIReader()
if err := directClient.List(context.TODO(), &secrets); err != nil {
panic(err)
}
ctrl.Log.Info("my direct secrets", "secrets", len(secrets.Items))
}
I'd suspect you've got something adding SecretList
to a scheme somewhere it shouldn't be
So, I got a reproducer for you. Unfortunately your code did not compile, so I did need to adapt:
package main
import (
"context"
"os"
"github.com/openshift/api"
"k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"
"sigs.k8s.io/controller-runtime/pkg/runtime/signals"
corev1 "k8s.io/api/core/v1"
"k8s.io/client-go/kubernetes/scheme"
_ "k8s.io/client-go/plugin/pkg/client/auth/gcp"
"sigs.k8s.io/controller-runtime/pkg/client/config"
"sigs.k8s.io/controller-runtime/pkg/manager"
logf "sigs.k8s.io/controller-runtime/pkg/runtime/log"
)
var log = logf.Log.WithName("cmd")
func main() {
logf.SetLogger(logf.ZapLogger(true))
cfg, err := config.GetConfig()
if err != nil {
log.Error(err, "Failed to get configuration")
os.Exit(1)
}
mgr, err := manager.New(cfg, manager.Options{})
if err != nil {
log.Error(err, "Failed to create manager")
os.Exit(1)
}
stopCh := signals.SetupSignalHandler()
if started := mgr.GetCache().WaitForCacheSync(stopCh); !started {
panic("not all started")
}
// add openshift API
if err := api.Install(scheme.Scheme); err != nil {
log.Error(err, "Failed to register OpenShift schema")
os.Exit(1)
}
client := mgr.GetClient()
// get unstructured
usecrets := unstructured.UnstructuredList{}
usecrets.SetKind("SecretList")
usecrets.SetAPIVersion("v1")
if err := client.List(context.TODO(), nil, &usecrets); err != nil {
panic(err)
}
log.Info("my unstructured secrets", "secrets", len(usecrets.Items))
// get typed
var secrets corev1.SecretList
if err := client.List(context.TODO(), nil, &secrets); err != nil {
panic(err)
}
log.Info("my secrets", "secrets", len(secrets.Items))
}
Which results in:
2019-03-19T08:38:41.815+0100 INFO cmd my unstructured secrets {"secrets": 487}
panic: multiple group-version-kinds associated with type *v1.SecretList, refusing to guess at one
goroutine 1 [running]:
main.main()
/home/jreimann/gobase/enmasse/src/github.com/enmasseproject/enmasse/cmd/test/main.go:65 +0x5f5
I am using:
- package: sigs.k8s.io/controller-runtime
version: 0.1.9
Which resolved to:
- name: sigs.k8s.io/controller-runtime
version: f6f0bc9611363b43664d08fb097ab13243ef621d
If you remove the registration of the OpenShift API, then it actually works. Digging a bit deeper into to code I think there is a fundamental flaw in the controller runtime, which assumes that the object kind is unique, which it isn't. Only the combination of Kind + ApiVersion is unique.
If that helps you, I can create a reproducer project on GitHub which shows the problem.
The issue is that the controller runtime when creating a default scheme, adds secrets and then when you add openshift it re-adds the type. Not the Kind but the Type.
Here: https://github.com/openshift/api/blob/master/install.go#L103
If you want to use api.Install
then you need to use a scheme when the manager is created that only installs the type once I think.
I am sorry, but that cannot be the case. First of all, I am using release-3.11
, which doesn't have this code: https://github.com/openshift/api/blob/release-3.11/install.go
Second, as you can see in the code you linked, that would be done by a call to InstallKube
, not Install
.
I created a reproducer repository for you: https://github.com/ctron/controller-runtime-362
As you can see in the output, the object is there only once, and the conflicting entry is caused by two different types, having the same "kind":
Dumping GVK ...
- /v1, Kind=SecretList
- image.openshift.io/v1, Kind=SecretList
hey, sorry about letting this fall through the cracks a bit -- lost track of it.
Ok, so, I'm decently certain that this is actually an OpenShift bug, and here's why:
It's certainly possible to have two the same Go type associated with different GVKs, but it's almost certainly not what you meant. In controller-runtime, we assume that Group-Version-Kind is unique, and that, with the exception of historical weirdness around metav1.XYZMeta
, the combination of package and type is unique. That is: there's a one-to-one mapping between Group-Version-Kind and "package".Type
. That's a relatively sane assumption (IMO) that is relied upon across controller-runtime, and also almost certainly in a few spots in k/k. The only place I know of where that's broken in the existing kube ecosystem is for a few metav1 spots, and all of those are labeled "we should fix this, it's not what we're supposed to be doing" (and aren't actually top-level objects).
OpenShift is breaking this assumption here: https://github.com/openshift/api/blob/b772cc93d057f30f40f93347a043c3bcef11d268/image/v1/register.go#L48
where it's adding the existing corev1.SecretList
type to a non corev1 API group. It looks like it mainly exists to generate a client for a subresource, but in that case the object shouldn't be re-added to a new API group -- it should simply be re-used from the old one, and the scheme for that client should have both API groups included. Either that, or type SecretList corev1.SecretList
in that API group.
@DirectXMan12 Thanks for investigating this. I opened a new issue with the OpenShift API, and referenced your findings. Maybe we can find a proper solution then.
Cross posting from https://github.com/openshift/api/issues/270
In general, cross registering types is an acceptable thing to do. That's why when the scheme maps them, it allows for multiple matches https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/runtime/scheme.go#L236 .
It allows serialization of an existing type into a new group. We make use of cross registration for several types in k8s.io/apimachinery/apis/meta/v1 and it is a generically acceptable concept.
We make use of cross registration for several types in k8s.io/apimachinery/apis/meta/v1 and it is a generically acceptable concept.
I was under the impression that most of the cross-mapping from meta/v1
had "this is bad, we should fix it" comments on them (some of those lines do indeed seem to have been refactored away), and that in general, cross-registration was undesirable. Is that not the case? It's unclear to me why you'd want to do it in most cases, especially since you could just do a newtype and preserve the nice property that any given type has a single associated GVK.
At any rate, we could make it so that we fall back to GVK from the GVK field, if we can't determine conclusively from the type, but this still feels to me like OpenShift is doing things that it shouldn't be (in fact, it's not clear to me why this subresource isn't just returning APIVersion: v1, Kind: SecretList
(cross-group subresources, like we do for scale)).
At any rate, we could make it so that we fall back to GVK from the GVK field, if we can't determine conclusively from the type
Although, I feel like this would lead to weird/unexpected behavior in certain cases (especially when combining controllers into a single manager), so I'd still prefer to try and just get a fix in OpenShift.
I was under the impression that most of the cross-mapping from
meta/v1
had "this is bad, we should fix it" comments on them (some of those lines do indeed seem to have been refactored away), and that in general, cross-registration was undesirable. Is that not the case? It's unclear to me why you'd want to do it in most cases, especially since you could just do a newtype and preserve the nice property that any given type has a single associated GVK.
Cross registration is a desired feature. It is the simplest way to transition a mistaken name and to allow multiple potential mappings. Consider the migration case, the meta case, or the old apps case. Being able to cross-register is a feature and not an undesireable one.
I was under the impression that most of the cross-mapping from meta/v1 had "this is bad, we should fix it" comments on them (some of those lines do indeed seem to have been refactored away)
As the person who put a lot of these comments in, they're not about cross registration of types moving groups, they're about cross registration of meta types being registered in all groups. Not the same use case
GVK is unique re Openshift
No, we never intended that in Kube. The core runtime code explicitly and completely decouples resources from kinds. Only discovery is allowed to make up rules about resource to Kind mappings. You may not assume that Kind is only exposed on one aggregated API endpoint.
That is: there's a one-to-one mapping between Group-Version-Kind and "package".Type. That's a relatively sane assumption (IMO) that is relied upon across controller-runtime, and also almost certainly in a few spots in k/k.
If it's relied on in k/k, that's broken code.
Here's the relevant description in runtime/scheme.go
// In a Scheme, a Type is a particular Go struct, a Version is a point-in-time // identifier for a particular representation of that Type (typically backwards // compatible), a Kind is the unique name for that Type within the Version, and a // Group identifies a set of Versions, Kinds, and Types that evolve over time. An // Unversioned Type is one that is not yet formally bound to a type and is promised // to be backwards compatible (effectively a "v1" of a Type that does not expect // to break in the future).
and
// typeToGroupVersion allows one to find metadata for a given go object. // The reflect.Type we index by should not be a pointer. typeToGVK map[reflect.Type][]schema.GroupVersionKind
All types have a canonical GVK (primary) but may be registered multiple times and callers are not allowed to assume Go type == kind at this current time.
We have discussed, but not allowed, deviations from this in apimachinery. To change the definition of GVK/GVR/type mappings we'd need to have a KEP to alter it.
It is the simplest way to transition a mistaken name and to allow multiple potential mappings
Type aliases seem just as simple, but I could be misunderstanding what you're saying.
You may not assume that Kind is only exposed on one aggregated API endpoint.
We don't. We use the primary RESTMapping, same as kubectl xyz -f
does. Both rely on the principal "any given GVK that's not a subresource that you can actually perform CRUD on has a primary restmapping that's fine to always use". If that weren't the case, a lot of kubectl stuff would break, AFAICT.
If it's relied on in k/k, that's broken code.
Right or wrong, unless I'm misreading, lots of things assume as such by blindly using the first returned GVK (there's a ton more -- most invocations of ObjectKinds
do this). My quick survey probably missed a few with preconditions that make this moot, but it seems like blindly assuming the first GVK is the correct one when multiple are present is suspect -- it assumes that the first added to the Scheme is always correct, which seems hard to know.
The only (AFAICT) difference from what we do is that we bail instead of trusting the first GVK to be the right one. Is there something I've missed here?
We have discussed, but not allowed, deviations from this in apimachinery. To change the definition of GVK/GVR/type mappings we'd need to have a KEP to alter it.
Is that a KEP you'd be opposed to? I still posit that "type corresponds to GVK" is a sane assertion, so I'd be happy to write up that KEP.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten
This issue is still an issue. It is not fixed. It is still a problem.
/lifecycle frozen
the bot's just being overly agressive
/priority awaiting-more-evidence
From backlog grooming: this issue needs to be triaged again, if it's not reproducible let's close it. In general, controller-runtime might not be the best place for a fix.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close
@fejta-bot: Closing this issue.
/reopen
This is still unsolved, adding the imagev1 scheme breaks secretList.
@alvaroaleman: Reopened this issue.
So I fixed this instance of the problem in openshift/api#780 by re-redefining the type (type a b
) (which is a different thing from aliasing a type, which doesn't help with this issue at all). That being said, the issue is that we make assumptions apimachinery doesn't guarantee, which happens to work most of the time and sometimes it doesn't. For those cases we should probably:
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close
@fejta-bot: Closing this issue.
I was thinking about this. k/k might not assume that one Go type maps uniquely to one GVK, but it seems that CR does. A 1:1 mapping is certainly easier for users of the go types. I think the type redefinition suggested here makes it easy for people authoring types, even while reusing the upstream definitions.
So, should we simply say that not all generated go code is compatible with controller-runtime? CR users always have the option of using unstructured, but I think there are a myriad of ways to generate types that will crash controller-runtime, and I don't think we can or should aim to support them all.
(Aside: should we turn off fejtabot in this repo? I'm not convinced it adds signal, it definitely adds noise)
@justinsb in theory there are some ways we could improve this, for example we could allow ppl to explicitly set typemeta and then use that information in non-unstructured objects. Today, we simply use the scheme and directly bail out if we don't get exactly one result back.
The issue here is that in practise this just hasn't happened often enough for anyone to bother to improve this in controller-runtime, but not that we'd be opposed to such a change.
but I think there are a myriad of ways to generate types that will crash controller-runtime, and I don't think we can or should aim to support them all.
I missed this part before. There are? Do you have one or more examples? The issue here causes an error but not a crash.
I mean, trivially, I can func init() { panic() }
:-) I can hand-code a deep-copy method and do it more subtly. We could debate about whether X should be allowed or Y should be allowed, but we could also just say that when you link in arbitrary go code there are limits to the guarantees we can make.
The idea that a go type maps to a GVK is (IMO) one of the things that makes controller-runtime so much easier to use than the raw libraries, so I have a hard time thinking how we can solve this while keeping what makes CR so useful. I can imagine always populating the GVK on a read and then requiring it on an update, but we'd still have problems on a create operation (it would be a user-facing breaking change to require a GVK on typed objects for create, IIUC).
/reopen
@vincepri: Reopened this issue.
/lifecycle frozen
This got fixed in https://github.com/kubernetes-sigs/controller-runtime/pull/2192
Trying to list a secret with the following code
Fails with the following error: