tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
466 stars 73 forks source link

Explore using GitHub cache for build artifacts to save builder time, especially for things like incremental builds #11442

Open tt-rkim opened 2 months ago

tt-rkim commented 2 months ago

Inspired by cache usage, @TT-billteng , and #10878

Some ideas:

tt-rkim commented 2 months ago

@TT-billteng Looks like GitHub cache is only 10GB: https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy

if we want to continue using GitHub cache. I'm seeing over >1GB on my local machine for all the object files, leaving us with a handful of possible entries to use before eviction starts.

I'm wondering if we should limit cache entries to contain binaries only from main branch in that case. This would hopefully speed up builds on branches anyway since we're working off a base of main. Even so, unless we think of ways to further reduce what we need to cache for incremental builds, only a couple of commits on main will actually have their artifacts cached at given time. I'm wondering how much this actually matters and if we should just go for it, still.

tt-rkim commented 2 months ago

Another limitation is we use the cache for python_env ! Oh no!

We're quite limited on space here. We may have to use our own storage for a caching solution.

TT-billteng commented 2 months ago

@TT-billteng Looks like GitHub cache is only 10GB: https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy

if we want to continue using GitHub cache. I'm seeing over >1GB on my local machine for all the object files, leaving us with a handful of possible entries to use before eviction starts.

I'm wondering if we should limit cache entries to contain binaries only from main branch in that case. This would hopefully speed up builds on branches anyway since we're working off a base of main. Even so, unless we think of ways to further reduce what we need to cache for incremental builds, only a couple of commits on main will actually have their artifacts cached at given time. I'm wondering how much this actually matters and if we should just go for it, still.

If we have space just for one entry (latest main), that would be already a great start. Maybe one entry for each build variant (Release, Debug, RelWithDebInfo).

tt-rkim commented 2 months ago

Currently issue is mtime of precompiled headers: https://github.com/tenstorrent/tt-metal/actions/runs/10426638021/job/28880945967

Will try just removing them for now and seeing where we get.

TT-billteng commented 2 months ago

ran into this problem also when trying ccache ugh

tt-rkim commented 2 months ago

Trying to:

spinning off run with pcx and CMakeCache.txt deleted on CI right now

tt-rkim commented 2 months ago

First attempt at this with GitHub Cache didn't work too well and I closed my initial PR.

We can always revisit if needed.