mapbox / spotswap

Deprecated
BSD 2-Clause "Simplified" License
27 stars 4 forks source link

On-demand group took 10 minutes to scale up after total priceout #3

Open jakepruitt opened 7 years ago

jakepruitt commented 7 years ago
cloudwatch_management_console

On 10/2 at 18:43:00, one of our EC2-based saw a priceout of the last AZ available to us in the region for that instance type, which took out all 9 of our spot instances, leaving us with 0 instances running in the service. That state of 0 running instances lasted about 7 minutes before the spot market briefly returned to normal. The only scale up we saw for the on-demand group was 10 minutes after the total priceout.

Looking in the logs of the spotswap lambda function, I found the logs that were closest to the priceout:

START 
2017-10-02T18:23:08.038Z Finding instance ids in spot group: SpotGroup
2017-10-02T18:23:08.342Z Checking for termination tags on 9 instances
2017-10-02T18:23:09.624Z Found 9 instances with SpotTermination tag
2017-10-02T18:23:12.362Z No-op on stack during a CloudFormation update
END 
REPORT Duration: 4325.97 ms Billed Duration: 4400 ms Memory Size: 128 MB Max Memory Used: 53 MB
START
2017-10-02T18:25:08.775Z Finding instance ids in spot group: SpotGroup
2017-10-02T18:25:09.199Z No instances listed
2017-10-02T18:25:09.255Z Found 0 instances with SpotTermination tag
2017-10-02T18:25:09.276Z Checking spot group SpotGroup for scaledown
2017-10-02T18:25:09.276Z Checking spot group SpotGroup for scaledown
18:25:09 END

Strangely, the No-op on stack during CloudFormation update appeared when there was no cloudformation update in progress. Then, in the second message, it shows No instances listed, which seems to then no-op the function when it should have taken the lack of instances as an indication that the on-demand group should scale up.

Two questions:

cc/ @arunasank @emilymcafee

henryngo commented 6 years ago

Any update on this? I'd like to implement SpotSwap in our environment but wary of this issue.

jakepruitt commented 6 years ago

@henryngo001 I haven't dug deeper into this issue - I think this was a rare condition though, spotswap works on a regular basis very well for us.

henryngo commented 6 years ago

@jakepruitt Thanks for the quick update.