spcl / serverless-benchmarks

SeBS: serverless benchmarking suite for automatic performance analysis of FaaS platforms.
https://mcopik.github.io/projects/sebs/
BSD 3-Clause "New" or "Revised" License
150 stars 68 forks source link

220 Video Processing error out on AWS Lambda #110

Closed nervermore2 closed 1 year ago

nervermore2 commented 1 year ago

When I run 220.video-processing benchmark, I met the following error. It seems like ffmpeg not installed correctly. Does anyone has any thoughts? Output from console with --verbose:

21:00:48,942 ERROR AWS.HTTPTrigger-e449: Output: {'message': 'Internal Server Error'}

Output from cloudwatch:

FileNotFoundError: [Errno 2] No such file or directory: '/var/task/function/ffmpeg/ffmpeg': '/var/task/function/ffmpeg/ffmpeg'

config.json:

    "perf-cost": {
      "benchmark": "220.video-processing",
      "experiments": [
        "cold",
        "warm"
      ],
      "input-size": "test",
      "repetitions": 50,
      "concurrent-invocations": 50,
      "memory-sizes": [
        128
      ]
    },

Command I ran:./sebs.py experiment invoke perf-cost --config config/example.json --deployment aws --verbose

Thanks

mcopik commented 1 year ago

@nervermore2 Thanks for the bug report! This indeed looks strange.

Can you please include the relevant parts of your config/example.json? I would need to see the actual configuration - in particular, the language version.

mcopik commented 1 year ago

@nervermore2 Did this issue occur on all invocations or only a few? All invocations should fail.

What do you see after executing ./sebs.py benchmark invoke 220.video-processing test --config config/example.json --deployment aws --verbose? This works for me.

nervermore2 commented 1 year ago

This issue occur on all invocations on aws,gcp,azure. This is the config/example.json

{
  "experiments": {
    "deployment": "gcp",
    "update_code": false,
    "update_storage": false,
    "download_results": false,
    "runtime": {
      "language": "python",
      "version": "3.7"
    },
    "type": "invocation-overhead",
    "perf-cost": {
      "benchmark": "311.compression",
      "experiments": [
        "cold",
        "warm"
      ],
      "input-size": "test",
      "repetitions": 50,
      "concurrent-invocations": 1,
      "memory-sizes": [
        256
      ]
    },
    "network-ping-pong": {
      "invocations": 50,
      "repetitions": 1000,
      "threads": 1
    },
    "invocation-overhead": {
      "repetitions": 5,
      "N": 20,
      "type": "payload",
      "payload_begin": 1024,
      "payload_end": 6251000,
      "payload_points": 20,
      "code_begin": 1048576,
      "code_end": 261619712,
      "code_points": 20
    },
    "eviction-model": {
      "invocations": 1,
      "function_copy_idx": 0,
      "repetitions": 5,
      "sleep": 1
    }
  },
  "deployment": {
    "name": "aws",
    "aws": {
      "region": "us-east-1",
      "lambda-role": ""
    },
    "azure": {
      "region": "eastus1"
    },
    "gcp": {
      "region": "us-east1",
      "project_name": **some-project**,
      "credentials": **somevalidcredentials**
    },
    "local": {
      "storage": {
        "address": "",
        "mapped_port": -1,
        "access_key": "",
        "secret_key": "",
        "instance_id": "",
        "input_buckets": [],
        "output_buckets": [],
        "type": "minio"
      }
    },
    "openwhisk": {
      "shutdownStorage": false,
      "removeCluster": false,
      "wskBypassSecurity": "true",
      "wskExec": "wsk",
      "experimentalManifest": false,
      "docker_registry": {
        "registry": "",
        "username": "",
        "password": ""
      },
      "storage": {
        "address": "",
        "mapped_port": -1,
        "access_key": "",
        "secret_key": "",
        "instance_id": "",
        "input_buckets": [],
        "output_buckets": [],
        "type": "minio"
      }
    }
  }
}
nervermore2 commented 1 year ago

The part I only made changes among each run of benchmark is "experiments"["deployment"] and "experiments"["perf-cost"]["benchmark"]

mcopik commented 1 year ago

@nervermore2 Sorry once again for your troubles. This is a very unusual bug, and I cannot reproduce it. I can see a spurious error "chmod: cannot access 'ffmpeg/ffmpeg': No such file or directory" during the packaging process but the cloud seems to work correctly.

Can you do the following? It should help me the understand the issue.

nervermore2 commented 1 year ago

Sure, will do very soon after working hours.

mcopik commented 1 year ago

@nervermore2 Any updates on this issue?

nervermore2 commented 1 year ago

Sorry, I'm busy working on some other stuff last month. I will reproduce this as soon as I can.

nervermore2 commented 1 year ago

Hi, I just had time to work on this. I test this with GCP on a GCP compute instance: This is the full log:

[17:00:11.955122] GCPCredentials-3782 Using cached credentials for GCP
[17:00:11.955379] GCPResources-ffdf No cached resources for GCP found, using user configuration.
[17:00:11.955497] GCPConfig-7306 Loading cached config for GCP
[17:00:12.309161] Benchmark-1800 Building benchmark 220.video-processing. Reason: no cached code package.
[17:00:19.660524] Benchmark-1800 chmod: cannot access 'ffmpeg/ffmpeg': No such file or directory

[17:00:19.663218] Benchmark-1800 Docker build of benchmark dependencies in container of image spcleth/serverless-benchmarks:build.gcp.python.3.7
[17:00:19.663350] Benchmark-1800 Docker mount of benchmark code from path /home/mingtsun/serverless-benchmarks-masters/serverless-benchmarks-master/220.video-processing_code/python/3.7
[17:00:31.804473] Benchmark-1800 Created code package (source hash: 13837d7a3bf89c73e838c0c3b58cf7f2), for run on gcp with python:3.7
[17:00:31.823644] GCP-3a90 Creating new function! Reason: function function-220_video_processing_python_3_7 not found in cache.
[17:00:35.201241] GCP-3a90 Uploading function function-220_video_processing_python_3_7 code to 220_video-processing-0-input-5c433bda-feff-4a
[17:00:35.898569] GCP-3a90 Function function-220_video_processing_python_3_7 has been created!
[17:00:36.400832] GCP-3a90 Function function-220_video_processing_python_3_7 accepts now unauthenticated invocations!
[17:00:40.694525] GCP-3a90 Function function-220_video_processing_python_3_7 - waiting for deployment...
[17:02:32.604402] GCP-3a90 Function function-220_video_processing_python_3_7 - deployed!
[17:02:32.605117] SeBS-026d Beginning repetition 1/5
[17:02:32.605216] GCP.HTTPTrigger-106d Invoke function https://europe-west1-x7-winged-hue-z.cloudfunctions.net/function-220_video_processing_python_3_7
[17:02:45.206306] GCP.HTTPTrigger-106d Invoke of function was successful
[17:02:45.207067] SeBS-026d Beginning repetition 2/5
[17:02:45.207402] GCP.HTTPTrigger-106d Invoke function https://europe-west1-x7-winged-hue-z.cloudfunctions.net/function-220_video_processing_python_3_7
[17:02:56.615899] GCP.HTTPTrigger-106d Invoke of function was successful
[17:02:56.616713] SeBS-026d Beginning repetition 3/5
[17:02:56.617062] GCP.HTTPTrigger-106d Invoke function https://europe-west1-x7-winged-hue-z.cloudfunctions.net/function-220_video_processing_python_3_7
[17:03:08.927242] GCP.HTTPTrigger-106d Invoke of function was successful
[17:03:08.928088] SeBS-026d Beginning repetition 4/5
[17:03:08.928451] GCP.HTTPTrigger-106d Invoke function https://europe-west1-x7-winged-hue-z.cloudfunctions.net/function-220_video_processing_python_3_7
[17:03:19.567372] GCP.HTTPTrigger-106d Invoke of function was successful
[17:03:19.568213] SeBS-026d Beginning repetition 5/5
[17:03:19.568601] GCP.HTTPTrigger-106d Invoke function https://europe-west1-x7-winged-hue-z.cloudfunctions.net/function-220_video_processing_python_3_7
[17:03:30.661190] GCP.HTTPTrigger-106d Invoke of function was successful

This run was successful (at least that's what the log said). However, I noticed. the line mentioned :"[17:00:19.660524] Benchmark-1800 chmod: cannot access 'ffmpeg/ffmpeg': No such file or directory" I'm not sure if that is related the error I triggered before. Would you be able to explain a bit why you wanna chmod ffmpeg and why this ffmpeg does not exist but we could still invoke the whole function? Thanks!

I will continue to test azure on azure instance and aws on ec2 instance and see if I could reproduce the error.

nervermore2 commented 1 year ago

Azure is also woking fine. But I still got the same type of "warning"

[22:35:29.976332] Benchmark-3d40 Building benchmark 220.video-processing. Reason: no cached code package.
[22:35:36.554402] Benchmark-3d40 chmod: cannot access 'ffmpeg/ffmpeg': No such file or directory

I would try with AWS again. For now, I'm trying workload size "test". Not sure if I choose larger workload size, the error would occur.

mcopik commented 1 year ago

@nervermore2 This error happens on your local system and indicates there might be a bug in our packaging code. I will investigate this.

nervermore2 commented 1 year ago

Lastly tested with AWS on AWS EC2 instance, same warning, no error on the output:

[18:03:39.787482] Benchmark-c492 chmod: cannot access 'ffmpeg/ffmpeg': No such file or directory
mcopik commented 1 year ago

@nervermore2 Yeah, this was a minor issue in the packaging code - this has also been fixed on the dev branch.