spcl / serverless-benchmarks

SeBS: serverless benchmarking suite for automatic performance analysis of FaaS platforms.
https://mcopik.github.io/projects/sebs/
BSD 3-Clause "New" or "Revised" License
150 stars 68 forks source link

Supporting Google Cloud Function V2 #158

Open nervermore2 opened 1 year ago

nervermore2 commented 1 year ago

Google Cloud published Google Cloud Function V2 last year, which is built on Cloud Run and Eventarc. It is interesting to see it's performance under Sebs benchmark set. It seems like majority of APIs remain the same.

I'm wondering if we could support Google Cloud function sometime in the future.

nervermore2 commented 1 year ago

Also, wondering if it's possible to support Python 3.8+ on GCP. It seems like the image we are using only supports python 3.7, however, if choose other images and rebuild the image in the central repo, then we could test on Python 3.8, 3.9 to have it fair comparison against Lambda and Azure. Thanks!

mcopik commented 1 year ago

@nervermore2

Google Cloud published Google Cloud Function V2 last year, which is built on Cloud Run and Eventarc. It is interesting to see it's performance under Sebs benchmark set. It seems like majority of APIs remain the same. I'm wondering if we could support Google Cloud function sometime in the future.

Yes, that would have been really interesting. I don't have it currently planned for the upcoming release. If you want to try it, then I'm happy to support and help in updating SeBS with the new backend.

mcopik commented 1 year ago

@nervermore2

Also, wondering if it's possible to support Python 3.8+ on GCP. It seems like the image we are using only supports python 3.7, however, if choose other images and rebuild the image in the central repo, then we could test on Python 3.8, 3.9 to have it fair comparison against Lambda and Azure. Thanks!

Yes, adding a new image should resolve the issue - functions should already be compatible with Python 3.8 and Python 3.9.

Here you need to add new images. You need to ensure that new images will use Python 3.8 and 3.9 - I'm not sure how their images deal with that.

nervermore2 commented 1 year ago

I saw the code. However, I also see that while running the benchmark and doing deployment, it's actually getting the image from the docker repository. Do I need to push image there as well or is it done by ./install.py?

In addition, I was reading more about the docker image python/ js version as well and I saw node js version 10,12,14 all point to the same docker file. I'm wondering what's the reason?

"nodejs": {
        "base_images": {
          "10": "gcr.io/google-appengine/nodejs",
          "12": "gcr.io/google-appengine/nodejs",
          "14": "gcr.io/google-appengine/nodejs"
        },
        "images": [
          "build"
        ],
        "username": "docker_user",
        "deployment": {
          "files": [
            "handler.js",
            "storage.js"
          ],
          "packages": {
            "@google-cloud/storage": "^4.0.0",
            "uuid": "3.4.0"
          }
        }
      }

Thanks

nervermore2 commented 1 year ago

I was happy to support higher language version. at least for AWS and Azure, we could point directly to the latest version of image in public repository: e.g

        "base_images": {
          "16.x": "amazon/aws-lambda-nodejs:16",
          "14.x": "amazon/aws-lambda-nodejs:14",
          "12.x": "amazon/aws-lambda-nodejs:12"
        },

However, it seems like its not enough for adding the new image to base_images. It seems like we need to rebuild image and upload the image to the spcl central repository. Or otherwise we need to rebuild all the stuff from scratch and put the image in our own repository.

I'm wondering if you have detailed documentations on how to do that (for rebuilding images locally and test it on a local repository, and then upload it to your central repo) ? That would be really helpful.

nervermore2 commented 1 year ago

I was able to support nodejs 16 on AWS and Azure by doing some trick. Which making the benchmark to use the nodejs 14 image even though the runtime is node js 16 (since node js 14 and 16 will use the same docker image I believe). However, for GCP for python, since you are not defining runtime version source through base_images, thus I saw lots of placing in your gcp image (https://hub.docker.com/layers/spcleth/serverless-benchmarks/build.gcp.python.3.7/images/sha256-6d1c329deffa497bc24b944a1401ae0344eb35f0efe866e31ae59fe5d5b0cf7e?context=explore) I see python version getting hard coded. e.g. ENV PYTHON_VERSION=3.7 for python I'm not an expert on docker stuff. But I would like to give it a try if I have instructions on how to build from source docker file. I believe I need original docker file code from your end and I need some instuctions on where/how I need to reproduce a docker image (but just with python 3.8 instead of 3.7 installed) Similarly, if I have the original dockerfile code, I could modify the VERSION=14 /bin/sh -c chmod to VERSION=16 /bin/sh -c chmod, etc, and try to rebuild the image and test it on my local end.

Thanks

nervermore2 commented 1 year ago

Update: Trying out and reading. tools/build_docker_images.py It would be helpful if we could have some explanations on how to use it and how should we modify it if we need to support more stuff in the image.

nervermore2 commented 1 year ago

After digging into the code a bit, I was mostly confused by the def install_dependencies(self, output_dir): function in sebs/benchmark.py. The reason I read into the code is because I need to know what part of dockerfile do I need to change in order to make the image working as expected.

The part I'm confused about is Why do we need to do install_dependencies? We created function using zipped code, which are stored in benchmarks forlder. We create the function in all providers by specifying runtimes. Why do we need additional runtimes for the container images? It seems like we don't even need to use gcr, mcr, ,ecr because we specify function built-in language runtime during deployment. We are not trying to deploy a container (because I see we are deploying a zipped code). So what specific tasks are we trying to resolve in the install_dependencies and why do we need those docker images (images in https://hub.docker.com/layers/spcleth/serverless-benchmarks and from public repositories`) ?

mcopik commented 1 year ago

@nervermore2 Thanks for the detailed information - yes, the building of Docker images is a deeply internal part that is not documented very well. Users are not expected to know that.

I will try to push today new Node.js images.

nervermore2 commented 1 year ago

Thanks, it would be great helpful if we can have both new Node.js images (16.x or even higher) and also new python images (3.8x or even higher) for GCP. For other providers, I was able to work around by using the older version of images, which will work as well.

By the way, would it be possible if you could briefly explain why do we need to use docker images (including a gcr/mcr/acr image) to launch our functions? We are deplying the code inside benchmarks/ folder using zip (and we install dependencies in requirements.txt or package.json and using runtime managed by the cloud. So it confuses me about why we need docker images. It seems like we got all the stuff we have.

mcopik commented 1 year ago

@nervermore2

Thanks, it would be great helpful if we can have both new Node.js images (16.x or even higher) and also new python images (3.8x or even higher) for GCP. For other providers, I was able to work around by using the older version of images, which will work as well.

I'm working on this now - first, I'm verifying that everything works with existing versions. I was able to confirm that all Python benchmarks (3.7, 3.8, 3.9) and Node.js benchmarks (14) work on AWS Lambda. I will try GCP next.

Updating to Node.js should be relatively easy. However, updating Python versions usually takes a bit longer as there are many package incompatibilities and I don't plan atm to go beyond 3.9. Python 3.10+ is planned for the next release (1.2), and for the current mid-release (1.1.5) I'm planning to add Node.js 16).

By the way, would it be possible if you could briefly explain why do we need to use docker images (including a gcr/mcr/acr image) to launch our functions? We are deplying the code inside benchmarks/ folder using zip (and we install dependencies in requirements.txt or package.json and using runtime managed by the cloud. So it confuses me about why we need docker images. It seems like we got all the stuff we have.

We aim to build all dependencies in the container similar to the one used in the cloud when executing functions. The reasoning is that some of the packages use native dependencies and installing it incorrectly might prevent functions from launching. We also have C++ benchmarks (planned for release 1.2) that require more care when building.

The build image is only needed for the step of installing dependencies. At the moment, it assumes all images live in our DockerHub - but this can be easily changed by using your own repository - just the repository name in config/systems.json. I'm planning to push newer images, but it can take a while as I need to verify they all work correctly.

nervermore2 commented 1 year ago

Updating to Node.js should be relatively easy. However, updating Python versions usually takes a bit longer as there are many package incompatibilities and I don't plan atm to go beyond 3.9. Python 3.10+ is planned for the next release (1.2), and for the current mid-release (1.1.5) I'm planning to add Node.js 16).

Thanks. It would be great help to have node 16.x and python 3.8 for now. since node14.x and python 3.7 is pretty old and is retired I believe.

We aim to build all dependencies in the container similar to the one used in the cloud when executing functions. The reasoning is that some of the packages use native dependencies and installing it incorrectly might prevent functions from launching. We also have C++ benchmarks (planned for release 1.2) that require more care when building

So we install dependencies in the docker, and ship them (may be in a dockerized image or zipped file) along with the code right. That's why we see requirements.txt or package.json almost contains nothing (besides the Pillow we just added)?

mcopik commented 1 year ago

@nervermore2 On the dev branch and in DockerHub, you can find GCP images for Python 3.7, 3.8, and 3.9. This required some major changes to our images. Furthermore, the benchmark 411.image-recognition does not work on AWS with Python 3.8, and does not work on GCP with Python 3.8 and 3.9. On both platforms, the reason is the size limit of code package.

mcopik commented 1 year ago

So we install dependencies in the docker, and ship them (may be in a dockerized image or zipped file) along with the code right. That's why we see requirements.txt or package.json almost contains nothing (besides the Pillow we just added)?

@nervermore2 Almost correct :) We install dependencies and ship the zipped file. The requirements.txt specify the ; Google Cloud supports automatic installation of dependencies from this file, whereas AWS does not. AWS only have automatic installation of dependencies when using CDK - this one also uses Lambda-compatible Docker images, as we do :)

Sometimes requirements.txt is empty in benchmarks because all dependencies have different versions depending on the Python version. We try to stick to a single version to enable reproducibility, but it's quite difficult.

mcopik commented 1 year ago

@nervermore2 You can find support for Node 16 on dev branch and in our DockerHub.

Updating to Node.js 18 might be possible - we will explore later if the existing images still support this. If you want to work on this yourself, I will happily accept a PR and guide you - we welcome all contributions! You can test your local image by changing the repository's name in config/systems.json.

The main problem is that many of these images are quite old. While AWS produces many new images, the official Google images are abandoned - nobody responds to issues, and PRs are not merged. The images are very old and based on Ubuntu 16.04. While current Node benchmarks still work there, Node 18 might not work there, and any new Node benchmark might require a newer OS. To solve this, one would have to start from image ubuntu:18.04 or ubuntu:2204, as these are the runtime for GCP functions, and install node in a given version - I did something very similar on the dev branch to replace the official Google images for Python (also abandoned) with custom, Ubuntu-based images.

nervermore2 commented 1 year ago

@nervermore2 Almost correct :) We install dependencies and ship the zipped file. The requirements.txt specify the ; Google Cloud supports automatic installation of dependencies from this file, whereas AWS does not. AWS only have automatic installation of dependencies when using CDK - this one also uses Lambda-compatible Docker images, as we do :)

Sometimes requirements.txt is empty in benchmarks because all dependencies have different versions depending on the Python version. We try to stick to a single version to enable reproducibility, but it's quite difficult. Thanks! It seems like since we are only importing libraries and then run benchmarks, adding stuff to requirements.txt could be enough for the dependencies. But apparently, you did more than that by integrating some stuff installed (by yourself) in a Ubuntu image to the zipped lambda/gcp/azure function.

@nervermore2 You can find support for Node 16 on dev branch and in our DockerHub. Thanks for your support on this. I will try this now.

nervermore2 commented 1 year ago

One last question regarding difference between BURST and COLD invocation type. I checked the code. It seems like in both types, we are trying to enforce the cold start before each invocation of the benchmark. However, COLD invocation type checks the environment and make sure all invocations in current run is COLD, if it's not, then it will redo the WARM ones. On the other hand, BURST invocation type does not make sure all invocations in the current run is COLD after updating the environment variables. I'm just wondering what would be the use cases for the BURST invocations type then? The invocations are mixed with mostly cold and partially warm (or maybe all cold and no warm). Without BURST invocation type, we could stil invoke burst traffic by specifying the concurrent invocation count in the examples/config.json.

nervermore2 commented 1 year ago

One more question on your latest ubuntu image for python, please correct me if I'm wrong: Previously you used cloud provider's native python/node js image, and install specific packages on them. Now it seems like you are using a blank ubuntu image, and add specific python language/environment to that, am I right? So, if I'm going to contribute to the repository in the future to add new images for GCP, it seems like if GCP does not provide newer nodejs image, I probably have to install nodejs dependencies first on that blank ubuntu image like you right?

mcopik commented 1 year ago

@nervermore2

Thanks! It seems like since we are only importing libraries and then run benchmarks, adding stuff to requirements.txt could be enough for the dependencies. But apparently, you did more than that by integrating some stuff installed (by yourself) in a Ubuntu image to the zipped lambda/gcp/azure function.

No, the requirements.txt will be processed automatically on some platforms (like GCP), but not on all. If you upload pure Python code and requirements.txt on AWS, the function will not automatically expand with dependencies.

We don't add dependencies from the Docker image, but use it to install them :)

mcopik commented 1 year ago

@nervermore2

I'm just wondering what would be the use cases for the BURST invocations type then? The invocations are mixed with mostly cold and partially warm (or maybe all cold and no warm). Without BURST invocation type, we could stil invoke burst traffic by specifying the concurrent invocation count in the examples/config.json.

Azure Functions do not have the same semantics of cold/warm because their function app can support multiple invocations. If you launch 5 functions simultaneously on Lambda, you will likely get 5 cold results. If you do it on Azure Functions, you will get maybe 1-2 cold and the rest will be warm runs. As you correctly notice, burst invocations are unnecessary on AWS/GCP/Whisk.

Please read our Middleware paper (link is in the repo) - it explains differences in the methodology.

mcopik commented 1 year ago

@nervermore2

One more question on your latest ubuntu image for python, please correct me if I'm wrong: Previously you used cloud provider's native python/node js image, and install specific packages on them. Now it seems like you are using a blank ubuntu image, and add specific python language/environment to that, am I right?

Correct - I do it because GCP itself specifies that the Google Cloud Functions environment uses these Docker images.

So, if I'm going to contribute to the repository in the future to add new images for GCP, it seems like if GCP does not provide newer nodejs image, I probably have to install nodejs dependencies first on that blank ubuntu image like you right?

Correct - it should be sufficient to use a Docker image similar to Python, but instead of installing Python from the APT repository, you will have to install Node.

mcopik commented 1 year ago

@nervermore2 I verified that Azure works well for Python 3.7-3.9 and Node 14. To add Node 16 & 18, we have to migrate to Azure Functions 4. I will see how much work this can cause.

mcopik commented 11 months ago

For future reference: new code needed to query logs of GCP v2:

from google.cloud import logging as gcp_logging

logging_client = gcp_logging.Client()
#logger = logging_client.logger("cloudfunctions.googleapis.com%2Fcloud-functions")
logger = logging_client.logger("run.googleapis.com%2Frequests")

timestamps = []
#for timestamp in [start_time, end_time + 1]:
#utc_date = datetime.fromtimestamp(timestamp, tz=timezone.utc)
#timestamps.append(utc_date.strftime("%Y-%m-%dT%H:%M:%SZ"))

from google.api_core import exceptions
from time import sleep
def wrapper(gen):
    while True:
        try:
            yield next(gen)
        except StopIteration:
            break
        except exceptions.ResourceExhausted:
            self.logging.info("Google Cloud resources exhausted, sleeping 30s")
            sleep(30)

function_name='function-1'
timestamps=['2023-12-01T21:41:00', '2023-12-01T21:45:19']
invocations = logger.list_entries(
    filter_=(f'''
        (resource.type="cloud_run_revision" resource.labels.service_name="{function_name}")
        OR
        (resource.type="cloud_function" resource.labels.function_name="{function_name}")
        severity>=INFO
        timestamp >= "{timestamps[0]}"
        timestamp <= "{timestamps[1]}"
    '''
    ),
    page_size=1000,
)
invocations_processed = 0
if hasattr(invocations, "pages"):
    pages = list(wrapper(invocations.pages))
else:
    pages = [list(wrapper(invocations))]
entries = 0
for page in pages:  # invocations.pages:
    for invoc in page:

        # convert to nanoseconds
        time = float(invoc.http_request['latency'][0:-1]) * 1000 * 1000 * 1000
mcopik commented 4 months ago

@nervermore2 PR #196 now supports Azure Functions Runtime v4 and works with Node.js 16, 18 & 20, as well as with Python 3.10 and 3.11.

The upcoming release will support Python up to 3.11 (3.10 on AWS) and Node.js up to 20 (up to 16 on AWS due to problems with the SDK v3).

nervermore2 commented 4 months ago

Thanks for supporting those language versions! They will be really helpful.