lithops-cloud / lithops

A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀
http://lithops.cloud
Apache License 2.0
317 stars 105 forks source link

StandaloneExecutor reuse problem #1263

Closed sergii-mamedov closed 8 months ago

sergii-mamedov commented 8 months ago

Hello!

I ran into a problem re-running code execution on an EC2 instance. Steps to reproduce the problem:

  1. I create a StandaloneExecutor (AWS EC2, Consume mode).
  2. Invoke some function on this executor and I get the result.
  3. After that, EC2 turns off automatically (consume mode).
  4. If I repeatedly try to invoke some function on the existing executor, I have a problem: it is impossible to connect via ssh. The instance starts but its public_ip is already different.

The problem is that the instance_data and public_ip variables were overwritten during the first run. After turning it off and on again, the EC2 instance always changes the IP address, and EC2Instance class does not anticipate this. I think for the EC2 executor in consume mode, after the function invoke is completed, it is unnecessary for the instance_data and public_ip variables to restore the default values.

JosepSampe commented 8 months ago

I added a patch that should be enough to fix this.

I think for the EC2 executor in consume mode, after the function invoke is completed, it is unnecessary for the instance_data and public_ip variables to restore the default values.

Currently the VM in consume mode is stopped after each function invocation. Personally I don't like this approach because if you invoke 3 maps in a row, the VM will be started and stopped 3 times (causing a lot of overhead). Personally I would keep the VM running until soft_dismantle_timeout. Is this what you meant in your comment?

sergii-mamedov commented 8 months ago

Is this what you meant in your comment? yep

We only used consume mode for a very small number of datasets and only for one step of our pipeline. After we moved from IBM Code Engine to AWS Lambda, it turned out (due to the 10GB RAM limit) that very rarely another pipeline step needed to be executed on an EC2 instance. That's how I found this problem. In the near future, I plan to test create mode, because only for it is available the option of spot instances.

sergii-mamedov commented 8 months ago

@JosepSampe I tested this changes, works well. Thanks.

JosepSampe commented 7 months ago

@sergii-mamedov FYI: I created lithops version 3.1.2