moby / buildkit

concurrent, cache-efficient, and Dockerfile-agnostic builder toolkit
https://github.com/moby/moby/issues/34227
Apache License 2.0
8.22k stars 1.16k forks source link

Improving shared sessions #1432

Open hinshun opened 4 years ago

hinshun commented 4 years ago

Currently, management of the session is highly coupled with the (*client.Client).solve function. The solve function does two things:

  1. Creates a session if one is not set in the SolveOpt
  2. Initializes the session if SessionPreInitialized == false, which is false by default.

When trying to share a session, you may be running multiple solves with the same session in order to not repeat work like transferring local files. However, since the solve function does the work of initialization it is unnatural to run a solve first to initialize the session before handing off to possibly-parallel solves.

Initialization does the following things:

  1. Create attachables which are additional GRPC servers registering on the client side. When buildkitd wants to, for example, transfer local files, then it invokes the RPC on the FileSync service to read files from the client system. You can think of them as a capability granted by the client.
  2. Run the session in a goroutine. It hijacks the underlying connection of the buildkit client and serves the GRPC server. By default, it only has the healthcheck RPCs regsitered, but may have additional RPCs registered via attachables.

Sharing sessions is really only important for transferring local files because of duplicate work. For attachables that transfer secret files or forward SSH sockets, we want only want builds that require it to have it available in the session. As a result, builds should be able to support having multiple sessions; some sessions for shared attachables, some for attachables specific to this build.

The current solve interface sends a single session ID. On the buildkitd side, the session ID is added to the context.Context, and then the relevant subsystem pulls the ID from the context.Context and fetches the session from the session.Manager by the ID.

We could send a map of attachable to session IDs like: {"local:context:"session123", "local:dockerfile":"session345", "exporter": "session567", "default": "session890"}.

Client-side

On the client side, we'll need to create helper functions to make the default use case (no advanced usage of shared sessions) easy, but also decouple the sessions from the solve function so that its lifecycle is handled separately.

Server

Do we want to store the mapping in the context.Context? It looks like it already delegates how to consume the session to the relevant subsystem, so for example the localExporter will know what key to look for in the mapping in order to retrieve the correct session ID.

tonistiigi commented 4 years ago

Current LLB to session interactions happen in 2 ways. First, in every solve request client sends the session ID https://github.com/moby/buildkit/blob/c44cb42a6937ff3fd27e089e72958ba73a62c100/client/solve.go#L207 . Now every time LLB is loaded, every vertex gets associated with that session ID through the build job. As vertexes can be shared by multiple builds, that means that a vertex actually has multiple sessions. When the solver calls the Op methods for the vertex, it attaches the session to the calling context. This actually has another issue that would use refactoring as vertex could have multiple sessions it should really pass all of them instead of a random one https://github.com/moby/buildkit/blob/c114e438f522ce6abe6aa8c713a3e83f4f6b2d69/solver/jobs.go#L71-L76 and every code that uses sessions should do a retry to another one if one session fails.

The second approach is that session ID can be attached directly to LLB with llb.SessionID method https://github.com/moby/buildkit/blob/c114e438f522ce6abe6aa8c713a3e83f4f6b2d69/client/llb/source.go#L291-L295 . If sessionID is set, then the local source implementation will not use the session in context and always use the one in LLB. When frontend is invoked, the sessionID in the request is passed to it as well via buildopts, so that when it constructs LLB it can call llb.SessionID() https://github.com/moby/buildkit/blob/c44cb42a6937ff3fd27e089e72958ba73a62c100/frontend/dockerfile/dockerfile2llb/convert.go#L372.

Assuming (not sure if valid) shared sessions only make sense for transferring local sources as long as we use session IDs in LLB everything should be quite maintainable. We could improve the library so that Solve takes llb.State instead of llb.Definition so that constraints can be used on marshaling to set session IDs automatically from locals. Then we could expose this mapping to frontends. With a build request, we would still send a single session that is used with context, together with all possible sessions that can be used for validation (to make sure frontend doesn't use session of another build) and the mapping that can be used by the frontend to match a session for a specific local.

What doesn't really work atm is sending a map of sessions to a frontend. There is no way to know if the frontend understands that convention or not. Atm. the existing frontend can only get one sessionID, so if, for example, dockerfile and context locals are on different sessions, there is no way for existing frontends to know/learn about it.

hinshun commented 4 years ago

This actually has another issue that would use refactoring as vertex could have multiple sessions it should really pass all of them instead of a random one and every code that uses sessions should do a retry to another one if one session fails.

I have trouble understanding this part at a conceptual level. Vertices may have multiple sessions when the edge digest is the same right? When vertices actually require attachables from the sessions (i.e. via SSH forwarding, secrets, file syncing), do they ever have multiple sessions? I suspect that since attachables are unique, vertices that have multiple sessions can pick a session at random because it won't actually utilize the attachables at all? Please let me know if I have the correct understanding.

We could improve the library so that Solve takes llb.State instead of llb.Definition so that constraints can be used on marshaling to set session IDs automatically from locals.

Can you expand on this more? What do you mean by setting session IDs from locals?

Then we could expose this mapping to frontends. With a build request, we would still send a single session that is used with context, together with all possible sessions that can be used for validation (to make sure frontend doesn't use session of another build) and the mapping that can be used by the frontend to match a session for a specific local.

Can you give an example of a session that is used for validation? How is validation performed?

What doesn't really work atm is sending a map of sessions to a frontend. There is no way to know if the frontend understands that convention or not. Atm. the existing frontend can only get one sessionID, so if, for example, dockerfile and context locals are on different sessions, there is no way for existing frontends to know/learn about it.

This topic is on backward compatibility right? Perhaps we can make an attachable on a session so that you can lookup other session IDs. Basically expose a map on the single session. Maybe it also does session proxying for older frontends... not sure if these requests go through the gateway or not (if we have an opportunity to test capabilities of the frontend).

tonistiigi commented 4 years ago

I have trouble understanding this part at a conceptual level.

Eg. look at llb.Image("company/privateimage"). Two parallel requests using the same image will have the same llb digest and pull only happens once. Now the question is what session is used for credentials of this shared pull and the answer is a random one. Should the request that shares the session cancel its build right at the moment when the session is used for asking credentials it would fail the other build as well.

Can you expand on this more? What do you mean by setting session IDs from locals?

We could have a Solve(locals, state) that internally calls state.Marshal(llb.SessionMap(locals.ToMap())) so that the locals in the marshaled LLB get the correct session IDs based on the local configuration used by build.

Can you give an example of a session that is used for validation? How is validation performed?

There is no session validation atm. But basically it could be similar to entitlements. So when build request is made it sets a list of "allowed-sessionids" and then on LLB load, every local source that uses a session ID is checked against this list.

This topic is on backward compatibility right?

Yes. We shouldn't break using a new client against already built frontend.

Perhaps we can make an attachable on a session so that you can lookup other session IDs. Basically expose a map on the single session. Maybe it also does session proxying for older frontends..

Yes, there might be some hacky things that can be done.

(if we have an opportunity to test capabilities of the frontend)

We have but the problem atm is that these frontends have already been built. So there is no way to add new caps checks to the existing ones.

hinshun commented 4 years ago

We have but the problem atm is that these frontends have already been built. So there is no way to add new caps checks to the existing ones.

I mean that if the session was a RPC on the gateway, it can check for a new field in the request, in order to detect if it's an older or newer frontend. I haven't read the code on how frontends interact with sessions (and their attachables) yet so just throwing out an idea here.