vercel / turborepo

Build system optimized for JavaScript and TypeScript, written in Rust
https://turbo.build/repo/docs
MIT License
26k stars 1.79k forks source link

Cache keeps growing indefinitely #863

Open 01walid opened 2 years ago

01walid commented 2 years ago

What version of Turborepo are you using?

1.1.6

What package manager are you using / does the bug impact?

pnpm

What operating system are you using?

Linux

Describe the Bug

Using turborepo in a relatively big and active monorepo. Saving and restoring turborepo's cache in CI is quickly becoming time-consuming. The cache grows to some hundreds of MBs in a matter of few days/weeks.

Expected Behavior

turborepo should cleanup old cache objects, either automatically or via a given flag (i.e. --refresh-cache). So the cache wouldn't grow unbounded.

To Reproduce

Just keep using turborepo saving/restoring the same cache over few days, given an active repo. Notice how the cache size grows.

attila commented 2 years ago

We've started noticing this too, restoring and saving an updated cache of around 1.1 GB easily takes over a minute on GitHub Actions. Alternatively, a CLI command to evict items older than x would be desirable, to keep cache sizes manageable.

florianmatz commented 1 year ago

Any news here? We are unfortunately experiencing the same issue and we can't use Vercels caching options due to internal guidelines...

dtinth commented 1 year ago

If you are using GitHub Actions, I’m building this action to solve this problem.

https://github.com/dtinth/setup-github-actions-caching-for-turbo

Instead of reading/writing cache from the filesystem and using a separate step (e.g. actions/cache) to save/restore this filesystem state, this action configures Turborepo to read from/write to GitHub Actions Cache Service API. This allows for fine-grained caching and avoids the problem where cache grows indifinitely.

If you are not using GitHub Actions, you deploy some open-source solution to your infrastructure:

crubier commented 1 year ago

What about local development? We have people here getting 300GB of cache on their local turbo repo cache

davecarlson commented 1 year ago

My current local folder ( I only checked out the code fresh 2 weeks ago) - this is madness !

15G ./node_modules/.cache/turbo
Kanary159357 commented 1 year ago

We have same problem on CI. Getting and restoring cache was continuously getting bigger and CI runs were terribly getting slower. find ./node_modules/.cache/turbo -mtime +7 -exec rm {} + Temporarily, I run script that gets files older than one week and remove it.

RazeiXello commented 7 months ago

I've also had this concern while using Turbo's caching. My temporary solution to this is using a modified stackoverflow answer for deleting files based on creation time in Javascript.

Command

node delete-old.mjs node_modules/.cache/turbo 604800000

604800000 = 7 days in ms

Script delete-old.mjs

// Modified from https://stackoverflow.com/a/23022459
import fs from 'fs';
import { fileURLToPath } from 'url';
import path from 'path';
import { rimraf } from 'rimraf';

const __filename = fileURLToPath(import.meta.url); // get the resolved path to the file
const __dirname = path.dirname(__filename); // get the name of the directory

// e.g. node delete-old.mjs directory
const directory = path.join(__dirname, '..', process.argv[2]);

// e.g. node delete-old.mjs directory 604800000
/**
 * Expiry time in milliseconds.
 */
const expiryTime = Number(process.argv[3]) || 604800000;

const dateFormatOptions = {
  weekday: 'long',
  year: 'numeric',
  month: 'long',
  day: 'numeric',
  hour: 'numeric',
  minute: 'numeric',
};

fs.readdir(directory, (err, files) => {
  console.log(`Checking for files older than ${getColouredText(expiryTime + ' ms', 31)}...\n`);

  if (err) {
    return console.error(err);
  }

  if (!files.length) {
    console.log(getColouredText(`Directory empty.`));
  }

  files.forEach((file, index) => {
    fs.stat(path.join(directory, file), (err, stat) => {
      let endTime, now;

      if (err) {
        return console.error(err);
      }

      now = new Date().getTime();
      endTime = new Date(stat.ctime).getTime() + expiryTime;

      if (now > endTime) {
        return rimraf(path.join(directory, file))
          .then(() => {
            console.log(`Successfully deleted expired file ${getColouredText(file, 32)}`);
            console.log(
              `Created Date: ${getColouredText(stat.ctime.toLocaleString('en-CA', dateFormatOptions), 33)}\n`
            );
          })
          .catch(err => {
            console.error(err);
          });
      }
    });
  });
});

/**
 * Available colours: https://en.wikipedia.org/wiki/ANSI_escape_code#Colors
 *
 * @param {*} text
 * @param {*} colourCode
 * @returns
 */
function getColouredText(text, colourCode = 33) {
  return `\x1b[${colourCode}m${text}\x1b[0m`;
}
mstuercke commented 3 months ago

If you want to keep an exact amount of caches, you can use my example:

export CACHES_TO_KEEP=2
ls -At -1 -d "$PWD/node_modules/.cache/turbo/"* | tail -n "+$(($CACHES_TO_KEEP*2+1))" | xargs -r rm

Explanation:

I hope that this will be implemented soon, to avoid these workarounds.

GitHub action ```yml # .github/workflows/actions/remove-outdated-turbo-cache/action.yml` name: Remove outdated turbo cache description: "Removes outdated caches. This workaround can be removed, when this issue is resolved: https://github.com/vercel/turbo/issues/863" inputs: caches-to-keep: description: "Keeps the defined amount of the most recent caches (default: 10). All other caches will be removed" default: "10" runs: using: "composite" steps: - name: Remove old turbo cache shell: bash run: | outdated_files=$(ls -At -1 -d "$PWD/node_modules/.cache/turbo/"*) outdated_files_amount=$(echo "$outdated_files" | wc -l | awk '{$1=$1};1') outdated_files_size=$(du -ch $outdated_files | tail -1 | cut -f 1) echo "Removing $outdated_files_amount outdated cache files ($outdated_files_size):" echo "$outdated_files" echo "$outdated_files" | tail -n "+$((${{ inputs.caches-to-keep }}*2+1))" | xargs -r rm ``` ```yml # Usage as a workflow step: # ... - name: Remove outdated turbo cache uses: ./.github/workflows/actions/remove-outdated-turbo-cache with: caches-to-keep: 25 # ... ```
sysarcher commented 2 weeks ago

This is still a bug. (Turbo 2.0.14)