Closed VanillaSpoon closed 9 months ago
Tested this out on my HyperShift cluster and I have gotten this error when following the InstaScale guided demo
I1108 14:05:43.091423 1 nodepools.go:46] The instanceRequired array: m5.xlarge I1108 14:05:43.091445 1 nodepools.go:54] Built NodePool with instance type m5.xlarge and name instascaletest-m5-xlarge E1108 14:05:43.096248 1 nodepools.go:57] Error creating NodePool: status is 400, identifier is '400', code is 'CLUSTERS-MGMT-400' and operation identifier is 'dac380af-7a33-4b1a-a6aa-dcb353237a44': NodePool name 'instascaletest-m5-xlarge' is 24 characters long - its length exceeds the maximum length allowed of 15 characters I1108 14:05:43.096274 1 nodepools.go:59] Created NodePool: &{400 map[Content-Type:[application/json] Date:[Wed, 08 Nov 2023 14:05:43 GMT] Server:[envoy] Vary:[Accept-Encoding] X-Envoy-Upstream-Service-Time:[2] X-Operation-Id:[dac380af-7a33-4b1a-a6aa-dcb353237a44]] 0xc00068d8f0 <nil>} I1108 14:05:43.170794 1 nodepools.go:46] The instanceRequired array: g4dn.xlarge I1108 14:05:43.170815 1 nodepools.go:54] Built NodePool with instance type g4dn.xlarge and name instascaletest-g4dn-xlarge E1108 14:05:43.175629 1 nodepools.go:57] Error creating NodePool: status is 400, identifier is '400', code is 'CLUSTERS-MGMT-400' and operation identifier is 'd573d9d0-598b-494a-bc9c-cc7a5ba2c1dd': NodePool name 'instascaletest-g4dn-xlarge' is 26 characters long - its length exceeds the maximum length allowed of 15 characters I1108 14:05:43.175646 1 nodepools.go:59] Created NodePool: &{400 map[Content-Type:[application/json] Date:[Wed, 08 Nov 2023 14:05:43 GMT] Server:[envoy] Vary:[Accept-Encoding] X-Envoy-Upstream-Service-Time:[2] X-Operation-Id:[d573d9d0-598b-494a-bc9c-cc7a5ba2c1dd]] 0xc00068da40 <nil>}
Despite the odd error about the name being too long it says it created the nodepool. I checked out the Nodes in the
compute
section and no new nodes were created.
Hey @Bobbins228 , this is something I've raised an issue for, the nodepool name character limit is 15, so after appending the node type the name quickly surpasses the limit. This is also the case for machinepools, with a character limit of 30.
I'm hoping to address this soon, however, for now you're right, it should at least discontinue the function to build the nodepool when the error is returned.
Specifically with nodepools, after appending the additional information to the name, even a 4 character name can exceed the limit. so this may require a new naming convention
Ran through the nodepools e2e test using this PR on a Hypershift Cluster. Functionality works as expected. This is great!! :) Only comment is that the logs are a bit crowded from the finalizeScalingDownMachines func being called in the reconcile function.
Hey @Fiona-Waters, Thanks for pointing this out. I have replicated the issue here too. You are correct, this is due to finalizeScalingDownMachines func being called in reconcile. Would it be appropriate to add `hasCompletedScaleDown' or an equivalent as a field in the appwrapper?
Ran through the nodepools e2e test using this PR on a Hypershift Cluster. Functionality works as expected. This is great!! :) Only comment is that the logs are a bit crowded from the finalizeScalingDownMachines func being called in the reconcile function.
Hey @Fiona-Waters, Thanks for pointing this out. I have replicated the issue here too. You are correct, this is due to finalizeScalingDownMachines func being called in reconcile. Would it be appropriate to add `hasCompletedScaleDown' or an equivalent as a field in the appwrapper?
Maybe we could check the status of the node pool and if it is "deleting" then print the log line. I guess we would have to do this for machine pools and machine sets too. Seems like a bit of work to remove some extra log lines so of course up to you how to proceed. I'm not sure about adding a field to the appwrapper, would this only update after the node pool has completed deleted and then may not remove any log lines?
Re-ran on a hypershift cluster, can confirm that it works as expected. Great work Eoin! /lgtm
Moving this to WIP temporarily as nodes appear to be re-scaling after deletion in Hypershift.
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: anishasthana
The full list of commands accepted by this bot can be found here.
The pull request process is described here
Issue link
fixes https://github.com/project-codeflare/instascale/issues/80 closes https://github.com/project-codeflare/instascale/issues/175
This is a pr to allow for instascale integration with Hypershift.
What changes have been made
The pr also contains changes to remove irrelevant logging & repeated loop logging
Verification steps
Create a hypershift cluster. Deploy the stack. Create an appwrapper instascale enabled, and ensure the nodepool successfully scales up and down.
Checks