rancher / rancher

Complete container management platform
http://rancher.com
Apache License 2.0
22.82k stars 2.92k forks source link

[BUG] Schema Definitions returns a 503 for certain resource types #45158

Closed MbolotSuse closed 1 week ago

MbolotSuse commented 3 months ago

Rancher Server Setup

Information about the Cluster

User Information

Describe the bug When attempting to get the schema definitions for certain resources, specific resources always return a 503, even after a significant amount of time has passed. Below is the list of resources provided by @Priyashetty17:

management.cattle.io.userattribute              
management.cattle.io.globaldnsprovider            
management.cattle.io.catalog                 
project.cattle.io.app                    
management.cattle.io.user                  
management.cattle.io.nodetemplate              
generateKubeconfigOutput                   
management.cattle.io.projectnetworkpolicy          
management.cattle.io.template                
management.cattle.io.rkeaddon                
management.cattle.io.podsecuritypolicytemplate        
management.cattle.io.clustertemplate             
management.cattle.io.templatecontent             
management.cattle.io.podsecuritypolicytemplateprojectbinding 
management.cattle.io.rancherusernotification         
management.cattle.io.templateversion             
management.cattle.io.composeconfig              
project.cattle.io.apprevision                
management.cattle.io.dynamicschema              
management.cattle.io.authconfig               
management.cattle.io.clustertemplaterevision         
management.cattle.io.nodepool                
management.cattle.io.multiclusterapprevision         
management.cattle.io.catalogtemplateversion         
management.cattle.io.token                  
management.cattle.io.rkek8ssystemimage            
management.cattle.io.kontainerdriver             
management.cattle.io.globaldns                
management.cattle.io.clustercatalog             
management.cattle.io.etcdbackup               
management.cattle.io.nodedriver               
management.cattle.io.rkek8sserviceoption           
management.cattle.io.group                  
management.cattle.io.projectcatalog             
management.cattle.io.groupmember               
management.cattle.io.catalogtemplate             
management.cattle.io.samltoken                
management.cattle.io.node                  
management.cattle.io.multiclusterapp 

To Reproduce

  1. Run rancher/rancher:v2.9-head
  2. Retrieve the schemaDefinition for any of the types above.

Result A 503 response is returned.

Expected Result An object should be returned, with the same fields provided by the 2.8 schema object.

Screenshots N/A

Additional context It is possible that some of these objects are being seen as "abitrary" objects since the definitions are minimal.

richard-cox commented 2 months ago

I see the same for monitoring.coreos.com.alertmanagerconfig, brought in via the monitoring app. It should work for non-core rancher crds?

MbolotSuse commented 2 months ago

@richard-cox Keep in mind that a temporary 503 is expected behavior. When a new CRD is added there may be some time between when the installation occurs and when the definition is available. However, this should resolve shortly (matter of seconds). This issue is about resources that are constant 503s that never resolve - that part is a bug. Since the resource type that you mentioned here isn't available in core rancher, it is subject to possible, temporary 503.

That being said, trying it on my local it looks like that resource is part of the indefinite 503s, so I'll aim to fix it as part of this ticket.

This ticket will only aim to fix the permanent 503. LMK if you have any follow-up questions.

richard-cox commented 2 months ago

Unfortunately the 503 isn't temporary and is always returned. I didn't try on the local cluster, but see it on a downstream one.

MbolotSuse commented 2 months ago

Issue Summary

The schema definitions has a few bugs that can cause the definition to not be available for a schema, or to not be accurate for a schema:

MbolotSuse commented 2 months ago

Validation Template

Root Cause

As explained in the above issue summary, there were a few issues with how schema definitions were formed.

  1. For cases where a resource (e.x monitoring.coreos.com.alertmanagerconfig) existed in one version of a group (e.x. v1alpha1) but not in the preferred version of the group (e.x. v1), we would return a 503 error, and not produce a definition for the schema.
  2. There are some resources which aren't real kubernetes resources (e.x. counts, and generateKubeConfigOutput). These items would return a 503 error and not produce a definition when the definition was requested for that type.

Note that the 3rd issue mentioned in the summary was not fixed here.

What was fixed, or what change have occurred

Areas or cases that should be tested

  1. Cases where a resource is not in the preferred version of the group. The monitoring.coreos.com.alertmanagerconfig resource (which is available in the rancher-monitoring chart) is a good example.
  2. Cases where a resource is in the preferred version, but there's another version of the CRD. The definition should reflect the resource in the preferred version (ideally this version should also be "later" in the list).
  3. Cases of Rancher CRDs migrated to the RK API (e.x. GlobalRoles). These should have the full definition as seen in the openapi/v2 schema, and should not take any information from the baseSchemas. Note that you will still see 503 errors for older type resources (like Users).

What areas could experience regressions

Schema definitions for the above cases, and for the previously working endpoints.

Are the repro steps accurate/minimal?

N/A - see the original issue for more details.

Priyashetty17 commented 1 week ago

Validated with v2.9-ddabf6b2266255276352beb1eeb740d9e4de802d-head