lithops-cloud / lithops

A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀
http://lithops.cloud
Apache License 2.0
317 stars 105 forks source link

AWS Lambda invoker's performance depends on the Python interpreter #1219

Open gfinol opened 9 months ago

gfinol commented 9 months ago

I've noticed an issue with the performance of invocation of AWS Lambda functions. Depending on the python interpreter used, the performance of the invocation of cloud functions changes.

For example, when using the Python 3.10 interpreter of VM in AWS EC2 with Ubuntu 22.04, some AWS Lambda functions start is delayed between 5 and 10 seconds. As can be seen in this plot:

python31012-system1702566715_timeline

But using the same Python version (3.10.12) from Conda in the same VM, same OS and same AWS account I obtained a much better performance: python31012-conda1702567642_timeline

Despite the performance improvement when using Conda, there are still almost 50% of functions that take 1 second longer to start, even when in a warmed-up state (see the two last map stages from the previous plot). This behavior is the same for Python 3.8, 3.9, 3.10 and 3.11.

Click to see: Python 3.8 plot (using conda) ![python38-conda1702566938_timeline](https://github.com/lithops-cloud/lithops/assets/11145254/2cddf517-fb27-42a9-a446-4eba7417a1ea)
Python 3.9 plot (using conda) ![python39-conda1702567023_timeline](https://github.com/lithops-cloud/lithops/assets/11145254/605e7923-b220-4e1c-8519-190573a662e3)
Python 3.10 plot (using conda) ![python31013-conda1702567068_timeline](https://github.com/lithops-cloud/lithops/assets/11145254/4e2cae6c-856e-4444-96b3-72685e05c3d7)
Python 3.11 plot (using conda) ![python311-conda1702567151_timeline](https://github.com/lithops-cloud/lithops/assets/11145254/fa179056-cd45-4061-ae29-6486102fe0af)

But with Python 3.7 the performance is what one would expect to be (almost perfect): python37-conda1702566853_timeline

All this previous plots have been generated doing 3 maps of 100 functions that sleep for 5 seconds. This has been executed from a t2.large VM with Ubuntu 22.04 in us-east-1, with all the Lithops default configurations except for the invoke_pool_threads that was set to 128. I have also used the same VM with Amazon Linux 2023 OS and the results are similar to the previous ones using the Conda interpreter (I could upload the plots if requested). I've used the current master branch of Lithops to do this test, but the issue can be reproduced using versions 3.0.0, 3.0.1, 2.9, and also 2.7.1.

Here is the code used:

import time
import lithops

def count_cold_starts(futures):
    cold = 0
    warm = 0
    for future in futures:
        stats = future.stats
        if stats['worker_cold_start']:
            cold += 1
        else:
            warm += 1
    return cold, warm

futures = []
fexec = lithops.FunctionExecutor()
for _ in range(3):
    num_fun = 100

    def my_sleep(x):
        time.sleep(x)
        return num_fun

    f = fexec.map(my_sleep, [5 for _ in range(num_fun)])
    fexec.get_result()
    futures.append(f)

    cold, warm = count_cold_starts(f)

    print(f"cold: {cold}, warm: {warm}")

fexec.plot()
aitorarjona commented 9 months ago

Hi @gfinol , just to make sure this is not an issue with lithops rather than its dependences, could you check the following?:

Thanks

gfinol commented 9 months ago

Hi @aitorarjona, here you have the results:

Python 3.10 (Ubuntu 22.04) Python 3.7 (conda env) Python 3.8 (conda env) Python 3.9 (conda env) Python 3.10 (conda env) Python 3.11 (conda env)

gfinol commented 9 months ago

Also, notice that "Boto3 and Botocore ended support for Python 3.7 on December, 13, 2023". So, the best performance is achieved with a Python version no longer supported.

aitorarjona commented 9 months ago

Just to make sure, maybe you could create a 3.11 venv and do pip install -U --no-cache-dir -r coda_py37.txt so it has the same versions as the 3.7 venv, but it mostly seems that there is something regarding Python threads that Lithops or boto3/botocore/urllib3 use that changed from 3.8 onwards.

gfinol commented 9 months ago

@aitorarjona I tried to do what you suggested with a 3.11 env, but it failed due to some version incompatibilities between libraries versions and the python version.

But I managed to get it working with 3.10. The results look like the previous ones:

1702902863_timeline

(Note that the certifi requeriment in conda_py37.txt points to a file, that line was removed to install them in python 3.10)

I agree with you that this, at a first glance, looks like a problem with the thread pool used. Not sure how that could be confirmed...

JosepSampe commented 9 months ago

I remember that some years ago I changed the invoke method of the lambda backend in order to improve the invocation performance. It was working well then (I think I did it with python3.6), but maybe that solution is not working properly now for newer versions of python (or boto3)

In the aws_lambda.py, can you try commenting the lines 630-653 and uncommenting lines 655-670? this way we will see how the boto3 lib perfoms invoking functions, and if this is the casue of the issue you are experiencing.

gfinol commented 8 months ago

@JosepSampe, I've been doing the tests that you suggested. I've executed the tests twice, because the results are worse. Here are the resulting plots:

With the Python 3.10 from the OS in Ubuntu 22.04 from the official AMI in AWS EC2:

pythons-3 10-sys-1704709941_timeline

Using the interpreter from conda, Python 3.10:

conda-3 10-1704710366_timeline

And using Python 3.7 with conda:

conda-3 7-1704710116_timeline

In general, the performance is worse. For example, we can have a look to the invocations using python 3.7: In this recent plot, the invocations in the second and third map are delayed 1 o 1.5 seconds. But in the original plots, the invocations were almost perfect.

I leave here the plots for the other python versions with conda:

Python 3.8 conda ![conda-3 8-1704710307_timeline](https://github.com/lithops-cloud/lithops/assets/11145254/87945ca9-6615-4075-bc62-32270d54dd4d)
Python 3.9 conda ![conda-3 9-1704710215_timeline](https://github.com/lithops-cloud/lithops/assets/11145254/62578991-f037-4466-ad6d-ee533d16c06e)
Python 3.11 conda ![conda-3 11-1704710461_timeline](https://github.com/lithops-cloud/lithops/assets/11145254/e68ef5dd-8072-4a19-9b2e-2c3fc6eb48e5)
JosepSampe commented 8 months ago

So, in summary, is this something related to Lithops? or is it more related to python? or AWS Lambda?

gfinol commented 8 months ago

I think that this is something related to Lithops. I guess that it might be related to how Lithops uses the invoker thread pool or the connection pool. But I reviewed the code of the AWS Lambda backend and I didn't see anything...

ZikBurns commented 7 months ago

Python Interpreter

I'm currently using Python 3.11 interpreter of VM in AWS EC2 with Ubuntu 22.04. I'm currently working on a modified runtime of aws_lambda. Lithops originally serializes the code, dependencies and parameters, uploads it to S3. The function then downloads from S3 and deserializes. I did some experiments to avoid the steps through S3. My invoke just calls the function, passing the parameters as payload. This is part of my aws_lambda.py:

self.lambda_client = self.aws_session.client(
    'lambda', region_name=self.region_name,
    config=botocore.client.Config(
        max_pool_connections=5000,
        read_timeout=900,
        connect_timeout=900,
        user_agent_extra=self.user_agent
    )
)
...
def invoke(self, runtime_name, runtime_memory, payload):
    response = self.lambda_client.invoke(
        FunctionName=function_name,
        Payload=json.dumps(payload, default=str)
    )
    return json.loads(response['Payload'].read().decode('utf-8'))

And this is how I use invoke:

def invocator(payload, number):
    start = time.time()
    result = self.compute_handler.invoke(payload)
    end = time.time()
    starttimes[number] = start
    endtimes[number] = end
    return result

def general_executor(payloads):
    with ThreadPoolExecutor(max_workers=len(payloads)) as executor:
        results = list(executor.map(lambda p: invocator(*p), payloads_with_numbers))
    return results

With this code, that is different from the way lithops originally works, I get the same problem described in this issue. This is why I think that is not related to Lithops.

I have a Containerized Runtime with many dependencies. For this experiment, every Lambda will just returns a String "Hello World".

return {
    'statusCode': 200,
    'body': "Hello World"
}

As you can see in the invocator code, I measure the startimes and the endtimes of every invocation. I invoked 100 functions in cold and warm state. With those times I can build a plot.

image image

As you can see, there is barely any difference between both cold and warm. This is because of this added delay described in this thread.

Conda Python Interpreter

If I install miniconda and create an env Python 3.11 in my AWS EC2 with Ubuntu 22.04. I execute the same code and get:

image image

The behavior using the conda environment looks more like what Lithops would do. Warm functions are take less than 1 second and cold takes half of the time it used to take.

I don't know why Conda solved the problem...