serverless / serverless-python-requirements

⚡️🐍📦 Serverless plugin to bundle Python packages
MIT License
1.11k stars 290 forks source link

Serverless creating a "null" folder upon deploying #689

Open ToneVDB opened 2 years ago

ToneVDB commented 2 years ago

When deploying the service to AWS lambda, serverless uploads a zip, as expected, to S3. The size of the lambda was unusualy big, resulting in a failed deployment, so i decided to download the zip file. Error thrown by serverless: Resource handler returned message: "Unzipped size must be smaller than 262144000 bytes (Service: Lambda, Status Code: 400

Upon downloading and expanding the zip, I found that there is a folder called "null" that is 342MB big Screenshot 2022-03-23 at 08 12 01

This folder is quite big and contains yarn for some reason? Inside of the 'serverless-python-requirements' folder are 2 sub folders:

  1. a 51MB folder called "downloadCacheslspyc"
  2. a 150MB folder with my poetry dependencies

There is also a .Cache folder of 60MB that I dont know how it got there. Because the zip is so big, the deployment keeps failing. I tried adding more ignores to the package patterns, but with no luck so far.

Could this be due to this plugin or should I report this to the main serverless repo? All guidance is apreciated.

My package.json:

{
  "name": "api",
  "license": "Unlicense",
  "description": "",
  "version": "0.1.0",
  "devDependencies": {
    "serverless": "^3.7.2",
    "serverless-domain-manager": "^6.0.2",
    "serverless-newrelic-lambda-layers": "^3.0.0",
    "serverless-plugin-warmup": "^7.1.0",
    "serverless-python-requirements": "^5.1.1"
  }
}

My serverless.json:

{
    "service": "api",
    "frameworkVersion": "3",
    "provider": {
      "name": "aws",
      "stage": "${opt:stage}",
      "runtime": "python3.9",
      "iam": {
        "role": {
          "statements": [
            {
              "Effect": "Allow",
              "Action": ["lambda:InvokeFunction", "dynamodb:*"],
              "Resource": "*"
            }
          ]
        }
      }
    },
    "functions": {
      "main": {
        "handler": "src/main.handler",
        "events": [
          {
            "http": "ANY /"
          },
          {
            "http": {
              "path": "/{proxy+}",
              "method": "any",
              "cors": true
            }
          }
        ],
        "timeout": 30,
        "warmup": {
          "default": {
            "enabled": true
          }
        }
      }
    },
    "package": {
      "patterns": [
        "!venv",
        "!*.Jenkinsfile",
        "!node_modules",
        "!src/tests",
        "!Dockerfile",
        "!terraform",
        "!config"
      ]
    },
    "plugins": [
      "serverless-domain-manager",
      "serverless-plugin-warmup",
      "serverless-python-requirements"
    ],
    "custom": {
     "pythonRequirements": {
        "usePoetry": true,
        "usePipenv": false,
        "slim": true
      },
      "stage": "${opt:stage}",
      "warmup": {
        "default": {
          "enabled": true
        }
      }
    }
  }
pgrzesik commented 2 years ago

Hello @ToneVDB - are you able to provide a small reproducible example that I could run on my side to check if the problem happens? I didn't enounter anything like this so far. Additionally, could you verify that you're using the latest version of the plugin?

ToneVDB commented 2 years ago

Hi @pgrzesik ,

We believe to have found the issue. Localy the packaging and deploying was running fine, the issue persisted in a CICD pipline on jenkins using a specific docker agent for that step.

We decided to run the deploy command with the --verbose flag resulting in the following output:

Generating requirements.txt from "pyproject.toml"
Parsed requirements.txt from pyproject.toml in /tmp/jenkins/workspace/api_feature_h2o-795/.serverless/requirements.txt
Installing requirements from "null/.cache/serverless-python-requirements/a5abf46136cbf8a77486507148e5e30f4eb9af0c7db22e9a9131f88f64ea313b_x86_64_slspyc/requirements.txt"
Using download cache directory null/.cache/serverless-python-requirements/downloadCacheslspyc

We suspect that the null value comes from the appdirectory library not being able to cope.

After setting useDownloadCache and useStaticCache to false, behaviour is back to normal. Verbose now outputs:

Generating requirements.txt from "pyproject.toml"
Parsed requirements.txt from pyproject.toml in /tmp/jenkins/workspace/api_feature_h2o-795/.serverless/requirements.txt
Installing requirements from "/tmp/jenkins/workspace/api_feature_h2o-795/.serverless/requirements/requirements.txt"

Dropping the cache statements.

The outputted zip file on s3 now looks as expected: Screenshot 2022-03-23 at 11 39 54

Without setting the cacheLocation it seems to resolve itself to null on our build agent.

It seems like the appdirectory package is at fault here. Seeing as it is deprecated, a potential fix could be to replace this package. A quick and dirty fix could be to try to catch the null value as it gets returned from the package and refrain from using any cache at all at this point.

Looking forward to your feedback, let me know if you would like more detail somewhere!

pgrzesik commented 2 years ago

Thanks a lot for sharing the details @ToneVDB 🙇 As for the appdirectory package - did you found the root cause in that package by any chance or is that just a suspicion due to the fact that this library is long deprecated? I'm assuming the reproduction steps for this issue are not trivial as it only happens on your Jenkins CI/CD pipeline? Locally it's working fine?

ToneVDB commented 2 years ago

No problem @pgrzesik :) Reproducing the issue is indeed a bit of a pain. I'll try to describe the details a bit more below, hoping it might make the issue more easy to reproduce.

We had a jenkinsfile that defined an agent per stage

pipeline{
   agent none
   ...
   stages{
      stage('a'){
         agent {dockerfile}
         ....
      }
      stage('b'){
         agent {some-other-agent}
         ....
      }
      stage('a'){
         agent {dockerfile}
         ....
      }
   }
}

In stage a we do a yarn install of serverless and all the sub-dependencies. Stage b does some terraform stuff. Stage c runs the serverless deploy/package command

Stage c still finds serverless as it is installed, but the agent got re-build/ initialized in a different location, resulting in either:

This is what we think in the end results in the appdirectory package outputting a null value, resulting in the above described issue. The quick fix here is to just to turn off the 2 cache parameters in serverless.json as descirbed above.

Another potential fix we found is on the jenkins side: If you define a 'global' agent the issue also disappears with the cache parameters set to true. This supports the tought path that it is a permission or access issue, as the agent is initialized/created once and continues to run for the entire pipeline in one go. example jenkinsfile:

pipeline{
   agent {dockerfile}
   ...
   stages{
      stage('a'){
         ....
      }
      stage('b'){
         ....
      }
      stage('a'){
         ....
      }
   }
}

I havent tested it localy, but I suppose by either removing the existing cache location or changing the permissions on the cache folder, the issue should become reproducable. Let me know if my train of tought makes sense to you 🤔

pgrzesik commented 2 years ago

Hey @ToneVDB - sorry for the delay in response - the notification got lost in my email. I've tried to reproduce this issue but couldn't do that locally - I think you might be right that appdirectory library might be at fault here and it doesn't deal well with Jenkins setup as yours. Do you know any alternatives to appdirectory library though? I was looking but I didn't find anything reasonable

ToneVDB commented 2 years ago

Hey @pgrzesik ! no problem, everyone is super bussy lately as far as i can tell ;) I am sorry but I havent found any replacement libraries as well. Seems like, in this repo, it is only used in one place thow but I am unfamiliar with the code. A workaround for now could be to check if the path that is outputted by the function getUserCachePath starts with null/ or even more specific : if the dirs.userCache() does that. That means that the result is invalid and should be discarded (disabling further use of the cache directory for that run) It feels a bit wrong to me to do it this way, but apart from re-writing the appdirectory package and finding a replacement library, this seems to me like the next best thing, but I am not convinced...