meltwater / drone-cache

A Drone plugin for caching current workspace files between builds to reduce your build times
https://underthehood.meltwater.com/blog/2019/04/10/making-drone-builds-10-times-faster/
Apache License 2.0
335 stars 81 forks source link

Drone step hanging for a bit after cache restore #194

Open xvandish opened 2 years ago

xvandish commented 2 years ago

Describe the bug In Drone, when running the plugin on a cache restore using GCS, the restorer component prints out that restore is finished and took x seconds. The Drone step for the restore, however, doesn't terminate for another ~30-50s. A screenshot is attached below.

In the screenshot you can see the component=restorer msg="cache restored" took=11.174617156s is printed as 12s by drone, but the pipeline step that does the restore still took ~54s.

The cache restored message is printed here and called by Exec here, and I don't see any work that occurs after. Could it be a hanging gcs connection that isn't immediately terminated and a worker waits around for? I'll take a look on my own, but figured I'd file this just in case this is a know issue or I made a mistake somewhere.

Any questions let me know, thanks!

To Reproduce Steps to reproduce the behavior:

  1. Using ... config (in .jsonnet format)
    local RebuildOrRestoreCache(isRestore) = {
    name: "%s-cache" % (if isRestore then "restore" else "rebuild"),
    image: "meltwater/drone-cache:latest",
    environment: {
    GOOGLE_APPLICATION_CREDENTIALS: "./credentials.json",
    BACKEND_OPERATION_TIMEOUT: "12m"
    },
    pull: "if-not-exists",
    settings: {
    debug: true,
    backend: "gcs",
    restore: isRestore,
    rebuild: !isRestore,
    override: false,
    // checksum function provided by plugin - https://github.com/meltwater/drone-cache/blob/master/DOCS.md#using-cache-key-templates
    cache_key: '{{ checksum "yarn.lock" }}',
    archive_format: "gzip",
    bucket: "...",
    region: "...",
    mount: [
      "node_modules",
    ],
    },
    };
  2. While ... Running the restore-cache step created above
  3. See error Pipeline step finishes much later than the final output.

Expected behavior A clear and concise description of what you expected to happen.

Screenshots Screen Shot 2021-12-14 at 4 38 38 PM

Desktop (please complete the following information): Running in CI off of the official image.

Additional context Add any other context about the problem here.

jimsheldon commented 2 years ago

Thank you for reporting this issue, we will investigate as soon as possible.

bdebyl commented 2 years ago

@xvandish is this still an issue you are facing? If so I can try to replicate it, though not having used GCS and having no live test enviropnment for it this may prove challenging to replicate.

messense commented 1 year ago

@bdebyl I'm facing the same kind of issue with AWS S3.

rmannibucau commented 1 year ago

Hi, I have the exact same issue on premise using a local volume. Logs state it takes a few sec (around 10s) but step lasts for between 1 and 2 minutes. I wondered if it can be due to the updates drone does to the pod images (replacing placeholders) but if so it is not convenient and we should report them to implement the pipeline differently on kubernetes probably.

rmannibucau commented 11 months ago

Hi,

any news on that? I get it on a single node kubernetes cluster with a local host volume for the cache so it is quite weird and bothering for a pipeline which should be fast (~1mn) it adds another minute of overhead/latency.