upbound / universal-crossplane

Enterprise-grade @crossplane from @upbound
https://upbound.io/product/universal-crossplane
Apache License 2.0
120 stars 31 forks source link

EnvironmentConfig reconciliation doesn't work #400

Closed arunpmohan closed 6 months ago

arunpmohan commented 11 months ago

We are using latest universal crossplane

https://charts.upbound.io/stable/universal-crossplane-1.13.2-up.2.tgz

We have 2 compositions where in once composition we pass on state to the next composition by creating an environmentconfig . So the managed objects in second composition remain paused ( using the pause annotation) until the property in this environmentconfig gets created with a valid value.

So once the valid values comes in the environmentconfig from 1st composition , this will reflect to unpause managed objects in second composition.

1st Composition which creates vpc-environment with labels where the natGatewayId is set after creation.

apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
  name: ncc-vpc
  labels:
    provider: aws
    type: nccvpc
spec:
  compositeTypeRef:
    apiVersion: saas.nokia.ncc/v1alpha1
    kind: Xnccvpc
  writeConnectionSecretsToNamespace: upbound-system
  environment:
    environmentConfigs:
    - ref:
        name: aws-environment
  patchSets:
  - name: vpcIdToLabels
    patches:
    - type: FromCompositeFieldPath
      fromFieldPath: spec.id
      toFieldPath: metadata.labels.vpcId
  resources:
  - name: ncc-nat-gateway
    base:
      apiVersion: ec2.aws.upbound.io/v1beta1
      kind: NATGateway
      spec:
        forProvider:
          region: us-east-1
          allocationIdSelector:
            matchControllerRef: true
          subnetIdSelector:
            matchControllerRef: true
            matchLabels:
              access: public
    patches:
    - fromFieldPath: spec.region
      toFieldPath: spec.forProvider.region
    - fromFieldPath: spec.id
      toFieldPath: metadata.name
      transforms:
        - type: string
          string:
            fmt: "%s-nat-gateway"
    - fromFieldPath: spec.id
      toFieldPath: spec.forProvider.tags[Name]
      transforms:
      - type: string
        string:
          fmt: "%s-nat-gateway"
    - type: ToCompositeFieldPath
      fromFieldPath: status.atProvider.id
      toFieldPath: status.natGatewayId
  - name: vpc-environment
    base:
      apiVersion: apiextensions.crossplane.io/v1alpha1
      kind: EnvironmentConfig
      metadata:
        annotations:
          saas.nokia.com/env-type: vpc-info
        labels:
          type: subnetinfo
    patches:
    - type: FromCompositeFieldPath
      fromFieldPath: spec.id
      toFieldPath: metadata.name
      transforms:
      - type: string
        string:
          fmt: "%s-vpc-environment"
    - type: FromCompositeFieldPath
      fromFieldPath: spec.systemName
      toFieldPath: metadata.labels.systemName
    - type: FromCompositeFieldPath
      fromFieldPath: spec.customerPrefix
      toFieldPath: metadata.labels.customerPrefix
    - type: FromCompositeFieldPath
      fromFieldPath: spec.id
      toFieldPath: metadata.labels.vpcId
      transforms:
      - type: string
        string:
          fmt: "%s"
    - type: FromCompositeFieldPath
      fromFieldPath: status.natGatewayId
      toFieldPath: data.natGatewayId

2nd composition where nodegroups need to be created once natGateway is created. This reads from the previous envionrment config ( created from 1st composition ) and tried to unset the paused annotation.

apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
  name: ncc-eks
  labels:
    provider: aws
    type: ncceks
spec:
  compositeTypeRef:
    apiVersion: saas.nokia.ncc/v1alpha1
    kind: Xncceks
  writeConnectionSecretsToNamespace: upbound-system
  environment:
    environmentConfigs:
    - ref:
        name: aws-environment
    - type: Selector
      selector:
        matchLabels:
        - type: FromCompositeFieldPath
          key: systemName
          valueFromFieldPath: spec.systemName
        - type: FromCompositeFieldPath
          key: customerPrefix
          valueFromFieldPath: spec.customerPrefix
        - key: type
          type: Value
          value: subnetinfo
  resources:
  - name: ncc-nodegroup-{{ . }}
    base:
      apiVersion: saas.nokia.ncc/v1alpha1
      kind: Xnccnodegroup
      metadata:
        name: patchme
        annotations:
          crossplane.io/paused: "true"
      spec:
        eksId: patchme
        compositionSelector:
          matchLabels:
            provider: aws
            type: nccnodegroup
        parameters:
          capacityType: patchme
          availabilityZone: patchme
    patches:
    - type: FromEnvironmentFieldPath
      fromFieldPath: natGatewayId
      toFieldPath: metadata.annotations["crossplane.io/paused"]

Right now i have this environmentconfig having proper value of the natGatewayId as below

apiVersion: apiextensions.crossplane.io/v1alpha1
data:
  natGatewayId: nat-0085820f532438bae
kind: EnvironmentConfig
metadata:
  annotations:
    crossplane.io/composition-resource-name: vpc-environment
    saas.nokia.com/env-type: vpc-info
  creationTimestamp: "2023-09-14T18:15:16Z"
  generateName: ig1-am-5fbc4-
  generation: 3
  labels:
    crossplane.io/claim-name: ig1-am
    crossplane.io/claim-namespace: ig1
    crossplane.io/composite: ig1-am-5fbc4
    customerPrefix: am
    systemName: ig1
    type: subnetinfo
    vpcId: ig1-am
  name: ig1-am-vpc-environment

If i describe my nodegroup i see that it is still paused after 52 minutes

(base) ~/chf-saas-install/ig1 $ c describe xnccnodegroup ig1-am-82f46-nfxp8 
Name:         ig1-am-82f46-nfxp8
Namespace:    
Labels:       crossplane.io/claim-name=ig1-am
              crossplane.io/claim-namespace=ig1
              crossplane.io/composite=ig1-am-82f46
Annotations:  crossplane.io/composition-resource-name: ncc-nodegroup-0
              crossplane.io/paused: true
API Version:  saas.nokia.ncc/v1alpha1
Kind:         Xnccnodegroup
Spec:
  Composition Selector:
    Match Labels:
      Provider:     aws
      Type:         nccnodegroup
  Customer Prefix:  am
  Eks Id:           ig1-am
  Parameters:
    Availability Zone:  [us-east-1a]
    Capacity Type:      ON_DEMAND
    Labels:
    Max Node Count:  10
    Min Node Count:  1
    Node Size:       dev
    Tags:
      k8s.io/cluster-autoscaler/enabled:  true
      k8s.io/cluster-autoscaler/ig1-am:   owned
      kubernetes.io/cluster/ig1-am:       owned
    Taints:
  Region:       us-east-1
  System Name:  ig1
  Vpc Id:       ig1-am
Status:
  Conditions:
    Last Transition Time:  2023-09-14T18:15:10Z
    Reason:                ReconcilePaused
    Status:                False
    Type:                  Synced
Events:
  Type    Reason                Age                From                                                             Message
  ----    ------                ----               ----                                                             -------
  Normal  ReconciliationPaused  52m (x2 over 52m)  defined/compositeresourcedefinition.apiextensions.crossplane.io  Reconciliation is paused via the pause annotation

Additionally crossplane for sure is started with enable environment configs as below you see in its start container.


    spec:
      containers:
      - args:
        - core
        - start
        - --enable-environment-configs

This used to work till 1.12 Universal crossplane and is broken from 1.13. Any help appreciated .

tnthornton commented 11 months ago

Thanks for the detailed issue @arunpmohan ! I think this issue should actually be over here https://github.com/upbound/universal-crossplane/issues based on the problems outlined. I'll go ahead and move it now - if you feel like that's a mistake, feel free to reply with the correction and we can sort through what about the up cli is causing this behavior 👍

arunpmohan commented 11 months ago

Thank you.

phisco commented 6 months ago

@arunpmohan sorry for the late reply, I hope you solved it in the meantime, I think the issue was related to the resolve policy introduced in 1.13, you'll want to set it to Always. Feel free to reopen the issue if it still apply.