Open r4f4 opened 3 days ago
@r4f4: This pull request references Jira Issue OCPBUGS-44925, which is valid. The bug has been moved to the POST state.
Requesting review from QA contact: /cc @gpei
The bug has been updated to refer to the pull request using the external bug tracker.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please ask for approval from r4f4. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
/label platform/aws
/hold
We are still missing another permission: ec2:AssociateAddress
time="2024-11-22T22:27:02Z" level=debug msg="E1122 22:27:02.785017 333 awsmachine_controller.go:543] \"Failed to reconcile BYO Public IPv4\" err=<"
time="2024-11-22T22:27:02Z" level=debug msg="\tfailed to reconcile Elastic IP: failed to associate Elastic IP \"eipalloc-0d9343e1bba507e66\" to instance \"i-0c664692a59d18dc5\": UnauthorizedOperation: You are not authorized to perform this operation. User: arn:aws:iam::460538899914:user/ci-op-ty3fb51s-e9af7-minimal-perm is not authorized to perform: ec2:AssociateAddress on resource: arn:aws:ec2:us-east-1:460538899914:elastic-ip/eipalloc-0d9343e1bba507e66 because no identity-based policy allows the ec2:AssociateAddress action. Encoded authorization failure message: jAhD8NZ_EO7bmq89o1YFJPdSyOJL0KSBMQh9r8DDvmX1GOASHHe1scQITr_dIA5P2rKWAPT-a54UTIind4Pqh4z4x-vXRLRk-k0Vq4u61G2CalS22C-Vw_oQhmiITgr9llWVtLP0SwsKYMT0uWxOlvlfqmwZ8BNw3bcgzP8W2N8wZnwB6pDW5BoPg7Zx-OgPd3rth36YPMawV8RW1B-LUY4aVsfWUmZfwfQXChsDesd39LClcPExlFh__cV8hwF4TYHJDruc6vqtwSdFhTyCq3ibWNAlutg-3ptOEM7zRx33USs4uTqLxdYLj4n-AaPdtj-ishlFEh0aZiyl6QmBvaecUTq4v2hUwyAssKdlwZIpjv7zoRYBw59qrBiksPkTQDOP-3cnLxIix6ZwX0nkDwCR3qG5ZwppzRAPpMYgOU03Uo9r3RMbB_pr9h0b6amdBBOilkYmnHIAk8_vWBvhBoBXblPc4LgbUv-ZB62g0oKM0GqwNJPp8JOaFMMSrL82cf2hxZ_a1Bv4sf2WwIoE7HY23Su7dE_KE8jwmhchRMPmb4nRVlyED-Vb39Tn14CZeWt4WFYZb2F6XBRXixuqCvcC-vxf2StrnUvlfczQA_bw1GqV8_0_6kvxAQvxOU7zCId4lQ3-cpCcfGh5Qeh3UwX5D1dDzeKCpqXbCnjT5mhn35Ani7CK7XpGTOWzK5VZu7unuau_n2L5292OQu2xbNPwgJTYpf_7nFwPRYVjE6RM_ZCU65TAJ_umlRpKbERYoahrBEpcJCVB2Z3WSzaaMfHvUYvOY8fKv6SglOCJjphoyLn70jkfZjLY5FuvxBwrlUfCqbPMFWn514b1ZY01o--5-v77NAQtIHmLaAvLv_pU2wKKa9g9qsjxnCPUMLIFxmgCUyKDT7nbBK9PUPipa4I8EEzWnw"
time="2024-11-22T22:27:02Z" level=debug msg="\t\tstatus code: 403, request id: b2c73727-f7fb-4ffd-b812-9484cae2ac11"
time="2024-11-22T22:27:02Z" level=debug msg=" >"
@r4f4: The following tests failed, say /retest
to rerun all failed tests or /retest-required
to rerun all mandatory failed tests:
Test name | Commit | Details | Required | Rerun command |
---|---|---|---|---|
ci/prow/e2e-aws-ovn-shared-vpc-edge-zones | 85617f66dae2cf18b5887de5eea42b0250386c5e | link | false | /test e2e-aws-ovn-shared-vpc-edge-zones |
ci/prow/e2e-aws-ovn-edge-zones | 85617f66dae2cf18b5887de5eea42b0250386c5e | link | false | /test e2e-aws-ovn-edge-zones |
ci/prow/okd-scos-e2e-aws-ovn | 85617f66dae2cf18b5887de5eea42b0250386c5e | link | false | /test okd-scos-e2e-aws-ovn |
ci/prow/e2e-external-aws-ccm | 85617f66dae2cf18b5887de5eea42b0250386c5e | link | false | /test e2e-external-aws-ccm |
Full PR test history. Your PR dashboard.
@mtulio any idea why this is happening?
@r4f4 there is a problem in the machine manifest as the type added to the machineset manifest, m6i.xlarge
, is not supported in the zone:
$ aws ec2 describe-instance-type-offerings --location-type availability-zone \
--filters Name=location,Values=us-west-2-wl1-sfo-wlz-1 \
--region us-west-2 --query 'InstanceTypeOfferings[].InstanceType'
[
"t3.xlarge",
"g4dn.2xlarge",
"t3.medium",
"r5.2xlarge"
]
This is happening because is missing the permission ec2:DescribeInstanceTypeOfferings
:
level=warning msg=unable to select instanceType on the zone[us-west-2-lax-1b] from the preferred \
list: [m6i.xlarge m5.xlarge r5.xlarge c5.2xlarge m5.2xlarge c5d.2xlarge r5.2xlarge]. \
You must update the MachineSet manifest: UnauthorizedOperation: You are not authorized to perform this operation. \
User: arn:aws:iam::460538899914:user/ci-op-nrkwfijt-e9af7-minimal-perm is not authorized to perform: \
ec2:DescribeInstanceTypeOfferings because no identity-based policy allows the \
ec2:DescribeInstanceTypeOfferings action
@mtulio that should've been added by https://github.com/openshift/installer/pull/9114 edit: is that permission always needed when specifying edge machine pools? If so we should add it to the edge permission group in https://github.com/openshift/installer/pull/9230
@mtulio that should've been added by #9114 edit: is that permission always needed when specifying edge machine pools? If so we should add it to the edge permission group in #9230
@r4f4 ec2:DescribeInstanceTypeOfferings
permissions is a default behavior when no instance is added to the (any) machine pool (CP, worker, or edge), it discovers what is the "best" supported instance to be used in the pool based in the target region (for general pools), and zone (for edge zones), using filters of that API. Not an edge-specific feature.
@mtulio that should've been added by #9114 edit: is that permission always needed when specifying edge machine pools? If so we should add it to the edge permission group in #9230
@r4f4
ec2:DescribeInstanceTypeOfferings
permissions is a default behavior when no instance is added to the (any) machine pool (CP, worker, or edge), it discovers what is the "best" supported instance to be used in the pool based in the target region (for general pools), and zone (for edge zones), using filters of that API. Not an edge-specific feature.
@mtulio that perm is not required in the non-edge case and we just display a warning that we could not find a preferred instance type. If the edge node cannot work with the default instance type, there should be a better default or further validation.
@mtulio that perm is not required in the non-edge case and we just display a warning that we could not find a preferred instance type.
@r4f4 I am interpreting this warning (which, imo, might be interpreted as failed in certain situations like CP or worker nodes' pool to prevent later failure) as required permission for control plane and worker pools. The installer will always call getInstanceTypeZoneInfo()
when no instance type is set in the pool (master, worker), as this is the default path for IPI, right? Am I missing some bit? do we have an CI test with this scenario (default install, without setting custom instances)?
@mtulio that perm is not required in the non-edge case and we just display a warning that we could not find a preferred instance type.
@r4f4 I am interpreting this warning (which, imo, might be interpreted as failed in certain situations like CP or worker nodes' pool to prevent later failure) as required permission for control plane and worker pools. The installer will always call
getInstanceTypeZoneInfo()
when no instance type is set in the pool (master, worker), as this is the default path for IPI, right? Am I missing some bit? do we have an CI test with this scenario (default install, without setting custom instances)?
It's not required, it's optional. If this call fails, we proceed with the hardcoded default instance types in the installer master, worker
do we have an CI test with this scenario (default install, without setting custom instances)?
AFAIK we do not as the way in which the steps are written we always set an instance type in the install-config.yaml
@mtulio that perm is not required in the non-edge case and we just display a warning that we could not find a preferred instance type.
@r4f4 I am interpreting this warning (which, imo, might be interpreted as failed in certain situations like CP or worker nodes' pool to prevent later failure) as required permission for control plane and worker pools. The installer will always call
getInstanceTypeZoneInfo()
when no instance type is set in the pool (master, worker), as this is the default path for IPI, right? Am I missing some bit? do we have an CI test with this scenario (default install, without setting custom instances)?It's not required, it's optional. If this call fails, we proceed with the hardcoded default instance types in the installer master, worker
my interpretation of this is required as, afaik, we don't expect the default path to fail :)
Furthermore, this function has been introduced long time ago, even before edge zones, to get the best instance in mostly regions, still covering regions that takes time to rolls up new gen of instances by AWS. For example, m6i.xlarge took some time to be available in eu-west-2 - where it supported only 5th Generation. Should the mostly users be penalty by getting more expensive, and slower instance types of mostly regions when some regions does not support it?
@mtulio that perm is not required in the non-edge case and we just display a warning that we could not find a preferred instance type.
@r4f4 I am interpreting this warning (which, imo, might be interpreted as failed in certain situations like CP or worker nodes' pool to prevent later failure) as required permission for control plane and worker pools. The installer will always call
getInstanceTypeZoneInfo()
when no instance type is set in the pool (master, worker), as this is the default path for IPI, right? Am I missing some bit? do we have an CI test with this scenario (default install, without setting custom instances)?It's not required, it's optional. If this call fails, we proceed with the hardcoded default instance types in the installer master, worker
my interpretation of this is required as, afaik, we don't expect the default path to fail :)
If we want it to be required, we have to remove the warning and actually fail the install. But that's not the case today and the warning was a design choice to make the permission optional.
It's needed by CAPA when Ipv4Pools are supplied.