Closed rirze closed 3 years ago
Hi, It should be totally possible to run this in a Lambda function. I was wondering if there were use-cases for this before to develop anything. I never seen a conversion lasting more than 15 mins and anyway, the processing is restartable so Lambda timeouts does not look a concern.
The only attention point I see is to forbid parallel conversion of the same instance Id (=race condition with unpredictable result) but it can be addressed...
I’m wondering about the kind of trigger to be supported: SQS? Direct Lambda invocation? In your use-case, what’s matter? And also what is the motivation to use Lambda instead of cmdline?
PS: Starting from v0.7.0, you can do already your own Lambda wrapper as the main() function takes an array of arguments that you can provide yourself...
Thanks for the reply! For my particular use case, we don't have any standalone full-time EC2 instances in the same account and are looking to run some spot instances. Running it in a Lambda would save some cost there. It also opens the possibility of using triggers as you mentioned.
Could you talk more about the potential race condition? I'm willing to contribute if it'll help.
The race condition is just about locking. Currently, there is no lock mechanism and if you launch 2 executions of the tool to convert the same EC2 instance, for sure, bad things will happen. In command line mode, this situation is unlikely but if placed in a Lambda, such events could occur more easily (as Lambda autoscale in parallell by default). Today, a way to prevent this to happen would be to set Lambda concurrency to 1 and thus, it enforces that only one Lambda (so conversion) can happen at a time limiting the overall conversion rate but removing the race condition. May be, this work around can fit your need in short term with no change to the code base...
For my understanding, as you do not have yet EC2 instances in your account, why to not create Spot instances directly? (You should not have the need for this tool as you do not have existing instances...)
That's a fair point, but I'm really interested in the "switching back and forth" between spot and on-demand. It might end up that we may maintain a schedule or need to scale up or down in response to load for example.
This tool is really meant to deal with 'Pet' machines (=legacy VM typically coming from a Lift&Shift migration). As you are starting "from scratch", I strongly advise you to consider a more Cloud Native way of dealing with resources. Did you look at 'Spot fleet'? https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-fleet.html
Combined with autoscaling group, it should give you more value than this tool. You will be able to adjust the ratio of On-Demand and Spot so may give you the "back and forth" aspects you are looking for.
Well, even with a Spot Fleet, I would have to worry about transferring state. I was hoping that this project would give me a starting point for managing transience.
The use case I'm preparing for would have a fixed number of machines, of which some may alternate between being a Spot Instance and On-demand Instance. It's my understanding that fleets are designed for slightly more flexible use cases where managing state is left to other means.
Am I misunderstanding something?
Indeed, Spot fleet have "Cattle" model in mind meaning states are not on the EC2 instances.
I do not know your use-case but relying on on stateful EC2 have now a reduced set of valid use-cases (ex: for transient data in HPC, Big Data etc...).
If you think that you are in such case, so the tool can possibly help. Remember that the conversion time is significant (~5 minutes)!
The tool will continue to evolve in short term toward more resilience and completeness of conversion (=more EC2 options covered). I'm, at this stage, reluctant to officialize a Lambda mode that could have a potential massive impact on customer infrastructure. I would like to gather more feedbacks from users about the current command line oriented usage before to extend to other ways of invoking it. As I said previously, the Python source file as a main() function that could be called from a Lambda handler implemented in another Python source file provided by you. If you limit the concurrency to 1, you should be able to make it work quickly as a Lambda. Please let me know about your work to help me gather inputs for future major evolution of the tool!
I'm closing this issue but I'm still interested in what is going on with your experiments as Lambda.
As title says, do you see any roadblocks to adopting this code for a lambda function? I'm worried about potential timeouts and refactoring the code to be able to be called from a lambda event.
What are your thoughts?