Open tonistiigi opened 5 months ago
I think this aligns with some refactoring I previously considered - the issue is that the internals of CacheExporterTarget
at the moment are almost entirely non-apparent from the caller (which while "neat", makes the implementation very difficult to understand).
Removing the complex normalization is a big improvement IMO - we've previously encountered horrible issues where with enough links, the complexity of the loop removal grows exponentially.
type CacheExporterTarget interface { Add(dgst digest.Digest, deps [][]CacheLink, results []CacheResult) (CacheExporterRecord, bool, error) }
I like the proposed interface, but I think having the 2d slice here is confusing. I assume the idea is that the 0th index here is for the vertex input index? If so, could we potentially group these into a wrapper struct, something like type Input struct {Links []CacheLink; Result CacheResult}
, then have Add(dgst digest.Digest, inputs []Input)
?
There shouldn't be a need to normalize, deduplicate or check for loops in the implementation like there is now.
So normalize
before essentially did this recombining process right? Is the reason that this isn't needed anymore that we don't decompose links in the same way, so we can preserve them, so we're guaranteed to get a valid graph in the first place?
If this functionality is just used to recover the missing data, I think that works. The loops can't appear in a reasonable DAG export, and there shouldn't be duplicates. The deduplication also shouldn't happen either.
Unclear how it affects https://github.com/moby/buildkit/pull/4468
I think this would have the clear affect of never including dependent records of random:
prefixes - which is really what we want (and somehow not what we seem to be getting). The main problem there is that it's totally non-apparent why we see the issue - the code is dense and difficult to understand why this isn't getting normalized away.
This shouldn't affect us in dagger, we worked around the base issue by changing certain digests to just not have the random:
prefix.
implement some debug tooling for comparing cache chains first
Agreed, do you have an idea of what that would look like?
One potential suggestion/improvement to your proposal if we're in this kind of area - I'd like to see the entire removal of the magical session:
/random:
prefixes. I think these properties should be part of the CacheMap
, probably the CacheOpts
map, so they can sit next to no-cache
, etc.
Maybe this already exists today with llb.WithoutExportCache
? And just needs to be fixed/tested/etc.
I'd be potentially interested in taking this bit? I think it's related, but should be mostly orthogonal to the main proposal.
something like type Input struct {Links []CacheLink; Result CacheResult}
Should the result be for input? Can't it be for the vertex itself, so the index for CachedResult
is output index, not input index.
Is the reason that this isn't needed anymore that we don't decompose links in the same way, so we can preserve them, so we're guaranteed to get a valid graph in the first place?
Yes, there is still some minimal amount of "normalization" but it should be possible to do it directly in the implementation of Add
and there shouldn't be need to rescan and analyze all of the added vertexes in a separate follow-up step.
One potential suggestion/improvement to your proposal if we're in this kind of area - I'd like to see the entire removal of the magical session:/random: prefixes. I think these properties should be part of the CacheMap, probably the CacheOpts map, so they can sit next to no-cache, etc.
The logic behind not putting this in cachemap was that decisions about what cache chain should be exported should be done by the cache export backend implementation, not the LLB op implementation.
Agreed, do you have an idea of what that would look like?
I think we need to 1) save the plaintext for the cache checksum in some debugging area so it can be looked up by digest 2) Add ability to print the cache graph with these plaintext being debuggable. Maybe this should be simple web interface where you get graph and then can click on nodes to get the full plaintext, list of results, and buttons to move to parents of children on the step. 3) optionally, it would be helpful to compare two (or maybe whole local boltdb) of such graphs to see where they diverge and what was the reason behind different cache checksum.
When looking into https://github.com/moby/buildkit/issues/4942 https://github.com/moby/buildkit/pull/4917#issuecomment-2109644009 I think it would be better to refactor the cache export interface.
Currently, the cache export target has the following interface:
This has advantages of being very generic and separating the scope of different components. One can call
Add
any time they have some cache checksum and create new chains of cache records.The limitations of this interface are:
random:
and are taken out before exporting, but because the dependencies of the key are walked after the main key, they can only be removed in the normalization step of the cache manifest. This means we can walk a big cache branch, and once we reach the root, we find that it had arandom:
prefix, and that whole branch is later removed because of it. Unclear how it affects https://github.com/moby/buildkit/pull/4468 .Instead I propose more specific interface:
The main difference is that the
ExportTo()
function needs to callAdd
for all deps first and based on that, put together a[][]CacheLink
array before it can callAdd
for the record itself. EveryCacheLink
needs to containCacheExporterRecord
from previous call, that otherwise is opaque object only known to the target implementation. This way it can see that if it doesn't have completeCacheLink
array it can skip the whole export for that record.Add
would only be called if caller has at least one validCacheLink
for each dependency. With the bool return value, target implementation can signify that it does not wish to cache such a key and noCacheExporterRecord
is made for it.The result of
CacheExporterRecord
can now also be cached with theCacheKey
and reused without walking the full graph again. There does need to be a way to detect that no new keys were added anywhere in the subgraph by another concurrent build request, but at least if nothing changed, then there is no need to check backlinks again.The implementation can just return the previous record it had already created if
Add
is called with samedgst
for theCacheLink.Src
that it already knows about. There shouldn't be a need to normalize, deduplicate or check for loops in the implementation like there is now.Because this is quite a big refactor it might make sense to implement some debug tooling for comparing cache chains first. So that once the implementation is done we can compare the the result and make sure there are no unexpected regressions.
@jedevc @sipsma