llvm / torch-mlir

The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
Other
1.3k stars 480 forks source link

excessive cache invalidation in ccache #1323

Open powderluv opened 2 years ago

powderluv commented 2 years ago

Investigate why we have ccache invalidation (especially when building from source) in the CI.

TODO: test local behaviour of ccache for a days worth of PyTorch changes and validate we see similar behaviour on the CI.

powderluv commented 2 years ago

some data points:

anush@MacBook-Pro torch-mlir % gh api \
  -H "Accept: application/vnd.github+json" \
  /repos/llvm/torch-mlir/actions/cache/usage

{
  "full_name": "llvm/torch-mlir",
  "active_caches_size_in_bytes": 12875838677,
  "active_caches_count": 38
}

and

anush@MacBook-Pro torch-mlir % gh api \
  -H "Accept: application/vnd.github+json" \
  /repos/llvm/torch-mlir/actions/caches     
{
  "total_count": 39,
  "actions_caches": [
    {
      "id": 8561,
      "ref": "refs/heads/main",
      "key": "ccache-macOS-torch_mlir_build_assets-macos-arm64-in-tree-ON-2022-08-30T22:49:32.966Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:52:16.940000000Z",
      "created_at": "2022-08-30T22:49:41.193333300Z",
      "size_in_bytes": 260985062
    },
    {
      "id": 8608,
      "ref": "refs/pull/1320/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-in-tree-ON-2022-08-31T20:51:58.992Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:52:04.766666700Z",
      "created_at": "2022-08-31T20:52:04.766666700Z",
      "size_in_bytes": 301616535
    },
    {
      "id": 8560,
      "ref": "refs/heads/main",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-in-tree-ON-2022-08-30T22:33:13.536Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:50:42.793333300Z",
      "created_at": "2022-08-30T22:33:21.193333300Z",
      "size_in_bytes": 300679370
    },
    {
      "id": 8568,
      "ref": "refs/heads/main",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-out-of-tree-OFF-2022-08-30T23:42:34.745Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:50:30.930000000Z",
      "created_at": "2022-08-30T23:42:43.846666700Z",
      "size_in_bytes": 492695119
    },
    {
      "id": 8607,
      "ref": "refs/pull/1320/merge",
      "key": "ccache-macOS-torch_mlir_build_assets-macos-arm64-in-tree-ON-2022-08-31T20:47:32.908Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:47:35.636666700Z",
      "created_at": "2022-08-31T20:47:35.636666700Z",
      "size_in_bytes": 261870709
    },
    {
      "id": 8606,
      "ref": "refs/pull/1326/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-in-tree-ON-2022-08-31T20:44:48.691Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:44:53.676666700Z",
      "created_at": "2022-08-31T20:44:53.676666700Z",
      "size_in_bytes": 301386313
    },
    {
      "id": 8605,
      "ref": "refs/pull/1326/merge",
      "key": "ccache-macOS-torch_mlir_build_assets-macos-arm64-in-tree-ON-2022-08-31T20:37:20.719Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:37:25.186666700Z",
      "created_at": "2022-08-31T20:37:25.186666700Z",
      "size_in_bytes": 261569159
    },
    {
      "id": 8580,
      "ref": "refs/pull/1320/merge",
      "key": "ccache-macOS-torch_mlir_build_assets-macos-arm64-in-tree-ON-2022-08-31T04:24:54.294Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:36:24.280000000Z",
      "created_at": "2022-08-31T04:24:59.120000000Z",
      "size_in_bytes": 260998212
    },
    {
      "id": 8579,
      "ref": "refs/pull/1320/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-out-of-tree-OFF-2022-08-31T04:24:51.998Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:34:28.890000000Z",
      "created_at": "2022-08-31T04:24:54.323333300Z",
      "size_in_bytes": 492648749
    },
    {
      "id": 8578,
      "ref": "refs/pull/1320/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-in-tree-ON-2022-08-31T04:24:24.603Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:34:26.246666700Z",
      "created_at": "2022-08-31T04:24:29.016666700Z",
      "size_in_bytes": 300767619
    },
    {
      "id": 8604,
      "ref": "refs/tags/oneshot-20220831.50",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-in-tree-ON-2022-08-31T20:31:21.884Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:31:25.486666700Z",
      "created_at": "2022-08-31T20:31:25.486666700Z",
      "size_in_bytes": 301398205
    },
    {
      "id": 8603,
      "ref": "refs/tags/oneshot-20220831.50",
      "key": "ccache-macOS-torch_mlir_build_assets-macos-arm64-in-tree-ON-2022-08-31T20:29:16.386Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:29:19.603333300Z",
      "created_at": "2022-08-31T20:29:19.603333300Z",
      "size_in_bytes": 261610659
    },
    {
      "id": 8602,
      "ref": "refs/pull/1325/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-out-of-tree-OFF-2022-08-31T20:06:33.851Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T20:06:39.460000000Z",
      "created_at": "2022-08-31T20:06:39.460000000Z",
      "size_in_bytes": 585486202
    },
    {
      "id": 8601,
      "ref": "refs/pull/1325/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-in-tree-ON-2022-08-31T18:52:57.135Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T18:53:05.840000000Z",
      "created_at": "2022-08-31T18:53:05.840000000Z",
      "size_in_bytes": 301395382
    },
    {
      "id": 8600,
      "ref": "refs/pull/1325/merge",
      "key": "ccache-macOS-torch_mlir_build_assets-macos-arm64-in-tree-ON-2022-08-31T18:41:25.524Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T18:41:27.660000000Z",
      "created_at": "2022-08-31T18:41:27.660000000Z",
      "size_in_bytes": 261698014
    },
    {
      "id": 8599,
      "ref": "refs/pull/862/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-out-of-tree-OFF-2022-08-31T17:48:58.672Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T17:49:06.020000000Z",
      "created_at": "2022-08-31T17:49:06.020000000Z",
      "size_in_bytes": 586311251
    },
    {
      "id": 8598,
      "ref": "refs/pull/862/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-in-tree-ON-2022-08-31T16:31:00.389Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T16:31:04.190000000Z",
      "created_at": "2022-08-31T16:31:04.190000000Z",
      "size_in_bytes": 301622135
    },
    {
      "id": 8597,
      "ref": "refs/pull/862/merge",
      "key": "ccache-macOS-torch_mlir_build_assets-macos-arm64-in-tree-ON-2022-08-31T16:28:25.358Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T16:28:27.226666700Z",
      "created_at": "2022-08-31T16:28:27.226666700Z",
      "size_in_bytes": 261969537
    },
    {
      "id": 8596,
      "ref": "refs/tags/snapshot-20220831.582",
      "key": "ccache-Linux-torch_mlir_build_assets--2022-08-31T16:14:52.629Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T16:14:53.050000000Z",
      "created_at": "2022-08-31T16:14:53.050000000Z",
      "size_in_bytes": 5980942
    },
    {
      "id": 8595,
      "ref": "refs/pull/1318/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-out-of-tree-OFF-2022-08-31T13:36:30.473Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T13:36:37.873333300Z",
      "created_at": "2022-08-31T13:36:37.873333300Z",
      "size_in_bytes": 590135705
    },
    {
      "id": 8594,
      "ref": "refs/pull/1318/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-in-tree-ON-2022-08-31T13:35:46.871Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T13:35:50.500000000Z",
      "created_at": "2022-08-31T13:35:50.500000000Z",
      "size_in_bytes": 301374991
    },
    {
      "id": 8593,
      "ref": "refs/pull/1318/merge",
      "key": "ccache-macOS-torch_mlir_build_assets-macos-arm64-in-tree-ON-2022-08-31T13:31:50.375Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T13:31:55.260000000Z",
      "created_at": "2022-08-31T13:31:55.260000000Z",
      "size_in_bytes": 261689031
    },
    {
      "id": 8586,
      "ref": "refs/heads/ashay/mlir-python-bindings",
      "key": "ccache-Linux-torch_mlir_build_assets--2022-08-31T09:38:32.636Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T13:30:06.156666700Z",
      "created_at": "2022-08-31T09:38:33.676666700Z",
      "size_in_bytes": 116604009
    },
    {
      "id": 8574,
      "ref": "refs/pull/1318/merge",
      "key": "ccache-macOS-torch_mlir_build_assets-macos-arm64-in-tree-ON-2022-08-31T03:12:29.752Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T13:23:20.493333300Z",
      "created_at": "2022-08-31T03:12:32.986666700Z",
      "size_in_bytes": 261022042
    },
    {
      "id": 8575,
      "ref": "refs/pull/1318/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-in-tree-ON-2022-08-31T03:17:24.269Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T13:21:49.630000000Z",
      "created_at": "2022-08-31T03:17:27.620000000Z",
      "size_in_bytes": 300635868
    },
    {
      "id": 8581,
      "ref": "refs/pull/1318/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-out-of-tree-OFF-2022-08-31T04:37:24.333Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T13:21:44.700000000Z",
      "created_at": "2022-08-31T04:37:30.106666700Z",
      "size_in_bytes": 585284840
    },
    {
      "id": 8592,
      "ref": "refs/tags/snapshot-20220831.582",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-out-of-tree-OFF-2022-08-31T13:12:38.632Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T13:12:52.636666700Z",
      "created_at": "2022-08-31T13:12:52.636666700Z",
      "size_in_bytes": 637815608
    },
    {
      "id": 8591,
      "ref": "refs/tags/snapshot-20220831.582",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-in-tree-ON-2022-08-31T11:25:27.423Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T11:25:33.623333300Z",
      "created_at": "2022-08-31T11:25:33.623333300Z",
      "size_in_bytes": 300637490
    },
    {
      "id": 8590,
      "ref": "refs/tags/snapshot-20220831.582",
      "key": "ccache-macOS-torch_mlir_build_assets-macos-arm64-in-tree-ON-2022-08-31T11:16:42.808Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T11:16:44.870000000Z",
      "created_at": "2022-08-31T11:16:44.870000000Z",
      "size_in_bytes": 261027059
    },
    {
      "id": 8589,
      "ref": "refs/pull/1321/merge",
      "key": "ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-out-of-tree-OFF-2022-08-31T10:06:34.866Z",
      "version": "15c80211763d03468c2c9070680654f9264282cf4daa2c6ceac80f2e3eaeb295",
      "last_accessed_at": "2022-08-31T10:06:37.880000000Z",
      "created_at": "2022-08-31T10:06:37.880000000Z",
      "size_in_bytes": 492510923
    }
  ]
}
powderluv commented 2 years ago

so looks like we are loading really old caches in -- instead of the most recent cache that is uploaded.

https://github.com/llvm/torch-mlir/runs/8129353328?check_suite_focus=true restored from ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-out-of-tree-OFF-2022-08-31T10:06:34.866Z instead of the immediately preceding https://github.com/llvm/torch-mlir/actions/runs/2968703997 that finished 1+hr earlier and uploaded ccache-Linux-torch_mlir_build_assets-ubuntu-x86_64-out-of-tree-OFF-2022-09-01T05:17:27.524Z

So we need to debug / fix this cache because pinning the build wont help if we load an old cache.

ashay commented 1 year ago

Building on top of the previous findings, I realized that PyTorch uses precompiled headers, which, to work with ccache, require build flags that we might have to upstream to PyTorch.

However, we can perhaps work around these limitations by leveraging the fact that we don't clear the VM disk between consecutive CI runs, although we do remove the PyTorch build files. More precisely, we could change this snippet (in the package_pytorch() function of build_libtorch.sh):

  # Copy over all of the cmake files
  mv build/lib*/torch/share     libtorch/
  mv build/lib*/torch/include   libtorch/
  mv build/lib*/torch/lib       libtorch/
  # Copy over all lib files
  mv build/lib/*                libtorch/lib/
  # Copy over all include files
  mv build/include/*            libtorch/include/

to use cp -r instead of mv. Perhaps then, the build system would pickup the fact that the object files are newer than the source files, thus avoiding a full rebuild. There is an additional small change necessary to make sure that we run git fetch only if the requested commit hash is different from the existing commit hash (so as to not change the mtime of the source files), but hopefully that broad idea makes sense. Let me know if you spot any flaws. Thanks!

powderluv commented 1 year ago

I am ok with the change from mv to cp -r to see if it helped. I actually did that change from the original Pytorch to avoid copying and just mv for speed. So lets try that.

However I am not sure we should assume we don't clear artifacts between VM invocations in the CI. I thought it is supposed to be a clean run -- maybe it was a transient bug ?

ashay commented 1 year ago

However I am not sure we should assume we don't clear artifacts between VM invocations in the CI.

Lucky for us, when you and Maksim wrote the build_libtorch.sh script, y'all added a code path to handle both cases, one where the PyTorch source is checked out and one where it doesn't.

checkout_pytorch() {
  if [[ ! -d "$PYTORCH_ROOT" ]]; then
    ...
  else
    cd "${PYTORCH_ROOT}"
    git fetch --depth=1 origin "${TORCH_MLIR_SRC_PYTORCH_BRANCH}"
    git reset --hard FETCH_HEAD
  fi

Combined with the fact that we don't pass clean: true during the checkout phase, we might be able to safely make use of the existing files. And if they don't exist or are out of date, then the script can likely perform a fresh checkout of PyTorch.

powderluv commented 1 year ago

Ahh we added that path to support local customer forks of Pytorch source builds.