moby / buildkit

concurrent, cache-efficient, and Dockerfile-agnostic builder toolkit
https://github.com/moby/moby/issues/34227
Apache License 2.0
7.95k stars 1.11k forks source link

Unable to use Buildkit with Windows containers #616

Open tofflos opened 5 years ago

tofflos commented 5 years ago

I'm using the Buildkit version that comes bundled with Docker for Windows 18.06.1 and am experiencing some trouble running it with Windows containers. In the log below you can see a build succeed for a very simple build running without Buildkit and then failing once I enable it. The localized error message "Det går inte att hitta filen" roughly translates to "Unable to find the file". I've had success running Buildkit on the same system when running Linux containers. A minimal project that reproduces the error can be found here test.zip.

PS C:\test> docker version
Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        e68fc7a
 Built:             Tue Aug 21 17:21:34 2018
 OS/Arch:           windows/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.24)
  Go version:       go1.10.3
  Git commit:       e68fc7a
  Built:            Tue Aug 21 17:36:40 2018
  OS/Arch:          windows/amd64
  Experimental:     true
PS C:\test> ls

    Directory: C:\test

Mode                LastWriteTime         Length Name
----                -------------         ------ ----
-a----       2018-09-11     15:38             74 Dockerfile
-a----       2018-09-11     15:39             23 test.txt

PS C:\test> type .\Dockerfile
FROM microsoft/nanoserver:1803
COPY test.txt /test.txt
RUN type test.txt

PS C:\test> $Env:DOCKER_BUILDKIT=0
PS C:\test> docker build -t test .
Sending build context to Docker daemon  3.072kB
Step 1/3 : FROM microsoft/nanoserver:1803
 ---> 693ff1719e39
Step 2/3 : COPY test.txt /test.txt
 ---> 3cb8bc9e5e2e
Step 3/3 : RUN type test.txt
 ---> Running in 376f873629fd
This is a test message!Removing intermediate container 376f873629fd
 ---> 0cce47564a2d
Successfully built 0cce47564a2d
Successfully tagged test:latest

PS C:\test> $Env:DOCKER_BUILDKIT=1
PS C:\test> docker build -t test .
[+] Building 0.2s (2/2) FINISHED
 => local://dockerfile (Dockerfile)                                                                                                                                                                                                                                       0.1s
 => => transferring dockerfile: 31B                                                                                                                                                                                                                                       0.0s
 => local://context (.dockerignore)                                                                                                                                                                                                                                       0.1s
 => => transferring context: 2B                                                                                                                                                                                                                                           0.0s
failed to read dockerfile: open C:\ProgramData\Docker\tmp\buildkit-mount977689469\Dockerfile: Det går inte att hitta filen.
tonistiigi commented 5 years ago

Buildkit is not supported for Windows containers in docker 18.06/18.09

gerich-home commented 5 years ago

Any plans to support it?

quangkieu commented 5 years ago

If there is no windows container support yet, I think the error message need to be update to define expectation.

olljanat commented 5 years ago

@quangkieu it looks to be described on documentation: https://docs.docker.com/build/buildkit/#getting-started Only supported for building Linux containers

quangkieu commented 5 years ago

@olljanat I meant about the error message from the built process.

Barsonax commented 4 years ago

When is buildkit support coming for windows?

TBBle commented 4 years ago

Maybe a better question is what needs to be done/what are the outstanding dependencies?

Iristyle commented 4 years ago

Has anyone tried using buildctl on Windows via instructions at https://github.com/moby/buildkit#exploring-dockerfiles with buildkit daemon running in a container? Looks like that might be an alternative until docker build works properly on Windows?

olljanat commented 4 years ago

@Iristyle if you read that doc more carefully it also says

the buildkitd daemon is only available for Linux currently.

@Barsonax I'm bit worry about that we will not see Windows containers support ever because there is no Microsoft persons contributin to this project. Hopefully I'm wrong.

Iristyle commented 4 years ago

@olljanat well, I'm using LCOW, which hosts a real Linux kernel - so it's a bit of a grey area (and a lot of the docker folks don't seem to know much about in practical terms). I played around a little and I was getting closer to having rootless running per instructions at https://github.com/moby/buildkit/blob/master/docs/rootless.md#about---oci-worker-no-process-sandbox, noting that --privileged is not supported on Windows at all.

I'll update if I'm able to get it going or hit a dead end.

olljanat commented 4 years ago

@Iristyle that is probably possible but this issue is about real Windows containers so let's try keep on topic.

TBBle commented 4 years ago

Since last time I looked into this, containerd gained support for Windows 10 1809/Windows Server 2019, so it's possible no MS involvement in buildkit is needed, if it can get everything it needs for the low-level part via its containerd backend.

Edit: A quick look at the build system for buildkit suggests that you need running buildkit (either locally, or running inside Docker) to build buildkit. I'm somewhat flummoxed by this.

olljanat commented 4 years ago

@TBBle hmm. Yea here is some info about containerd support on https://docs.microsoft.com/en-us/virtualization/windowscontainers/deploy-containers/containerd so maybe it can be possible.

Then someone probably can try build buildkitd.exe for Windows to see where it fails. I also guess that latest Docker binaries with containerd support are needed ( more info about that https://github.com/moby/moby/pull/38541 )

TBBle commented 4 years ago

Ah, thank you. moby/moby#38541 is the PR reference I was looking for earlier.

Poking through, containerd doesn't seem to publish Windows binaries in their releases despite having thew new Windows V2 runtime in their 1.3.0 release, and their AppVeyor build pipeline doesn't capture artifacts.

The required hcsshim project does publish artifacts from their AppVeyor pipeline, even though they don't include them in their releases.

Both have recent-enough releases to meet the criteria laid out in moby/moby#38541 but they both also have active work on master which might make a difference.

containerd currently vendors a specific commit of hcsshim (Microsoft/hcsshim@d2849cbdb9dfe5f513292a9610ca2eb734cdd1e7), binaries for which can be fetched from AppVeyor. For containerd 1.3.2 (Microsoft/hcsshim@9e921883ac929bbe515b39793ece99ce3a9d7706) the binaries are also on AppVeyor but will expire in late February. Both of these vendored versions are older than the current hcsshim release, 0.8.7, whose artifacts are also on AppVeyor.

In the end, it's not clear to me if this ecosystem is yet in a state to start trying to get BuildKit working, and containerd/containerd#1920 (which has not been updated since the switch to the Windows V2 API) gives me a reasonable level of doubt.

TBBle commented 4 years ago

Quick correction: Containerd does have nightly builds for Windows, they're at https://github.com/containerd/containerd/actions?query=workflow%3ANightly

TBBle commented 4 years ago

So with a bit of hacking I got containerd working on my Windows 10 Desktop (mostly blocked by a bug recently introduced into containerd master Edit: Fix pending in containerd/containerd#3929).

I then did a bunch more hacking on BuildKit, including fixing a couple of bugs, and commenting out a lot of stuff.

Buildkitd ran, and tried to build me a package, but failed because it didn't copy the Dockerfile over.

PS C:\Users\paulh\Documents\BuildKit\simpleDocker> buildctl.exe --debug build --frontend=dockerfile.v0 --local context=. --local dockerfile=.
[+] Building 0.0s (0/0)
time="2020-01-05T07:47:33+11:00" level=debug msg="serving grpc connection"
[+] Building 0.1s (2/2) FINISHED
 => [internal] load build definition from Dockerfile                                                                     0.1s
 =>
 => transferring dockerfile: 983B                                                                                     0.0s
 => [internal] load .dockerignore                                                                                        0.1s
 =>
 => transferring context: 2B                                                                                          0.0s
error: rpc error: code = Unknown desc = failed to solve with frontend dockerfile.v0: failed to read dockerfile: open C:\Users\paulh\AppData\Local\Temp\buildkit-mount017874163\Dockerfile: The system cannot find the file specified.
failed to solve
github.com/moby/buildkit/client.(*Client).solve.func2
        C:/Users/paulh/go/src/github.com/moby/buildkit/client/solve.go:203
github.com/moby/buildkit/vendor/golang.org/x/sync/errgroup.(*Group).Go.func1
        C:/Users/paulh/go/src/github.com/moby/buildkit/vendor/golang.org/x/sync/errgroup/errgroup.go:57
runtime.goexit
        c:/go/src/runtime/asm_amd64.s:1357

I assume this is because I commented out too much, and somehow excluded the code that actually copies things into the snapshots, as both created snapshots were empty despite reporting having transferred stuff. The DockerFile itself did no transfers from the host OS, it's [MS's trivial Python example](# https://github.com/MicrosoftDocs/Virtualization-Documentation/blob/master/windows-container-samples/python/Dockerfile).

PS C:\Users\paulh\Documents\BuildKit\simpleDocker> buildctl.exe --debug du
ID                                                                      RECLAIMABLE     SIZE    LAST ACCESSED
x86vuhy70whikjae56p5wsfmo*                                              true            0B
m733jropkh4azwwgoknhowicq*                                              true            0B
Reclaimable:    0B
Total:          0B
PS C:\Users\paulh\Documents\BuildKit\simpleDocker> buildctl.exe --debug prune
ID                                                                      RECLAIMABLE     SIZE    LAST ACCESSED
m733jropkh4azwwgoknhowicq*                                              true            0B
x86vuhy70whikjae56p5wsfmo*                                              true            0B
Total:  0B
TBBle commented 4 years ago

With #1314, and some more hacking on things, I've gotten to the point where my next failure is coming from inside containerd, or the connection to it.

PS C:\Users\paulh\Documents\BuildKit\supersimpleDocker> buildctl --debug build --frontend=dockerfile.v0 --local context=. --local dockerfile=.
time="2020-01-06T08:03:16+11:00" level=debug msg="serving grpc connection"
[+] Building 4.7s (4/5)
[+] Building 4.7s (5/5) FINISHED
 => [internal] load build definition from Dockerfile                                                                     0.0s  => => transferring dockerfile: 588B                                                                                     0.0s  => [internal] load .dockerignore                                                                                        0.0s  => => transferring context: 2B                                                                                          0.0s  => [internal] load metadata for mcr.microsoft.com/windows/servercore:1909                                               0.2s  => CACHED [1/2] FROM mcr.microsoft.com/windows/servercore:1909@sha256:12327ccba5d74921479cc95b56e9422278ac3565740c2a46  0.0s  => => resolve mcr.microsoft.com/windows/servercore:1909@sha256:12327ccba5d74921479cc95b56e9422278ac3565740c2a46359bf0a  0.0s  => ERROR [2/2] RUN echo Write-Host -ForegroundColor Red Hello > wr.ps1                                                  4.4s ------
 > [2/2] RUN echo Write-Host -ForegroundColor Red Hello > wr.ps1:
------
error: rpc error: code = Unknown desc = failed to solve with frontend dockerfile.v0: failed to build LLB: executor failed running [powershell -command echo Write-Host -ForegroundColor Red Hello > wr.ps1]: failure waiting for process: rpc error: code = Unknown desc = ttrpc: closed: unknown
failed to solve
github.com/moby/buildkit/client.(*Client).solve.func2
        C:/Users/paulh/go/src/github.com/moby/buildkit/client/solve.go:203
github.com/moby/buildkit/vendor/golang.org/x/sync/errgroup.(*Group).Go.func1
        C:/Users/paulh/go/src/github.com/moby/buildkit/vendor/golang.org/x/sync/errgroup/errgroup.go:57
runtime.goexit
        c:/go/src/runtime/asm_amd64.s:1357

I've pushed one commit that needs more work (breaks the auto tests) plus my hacks onto https://github.com/TBBle/buildkit/tree/hacks_ahoy, in case anyone else wants to play with this.

For reference, I was working with source from containerd/containerd#3929, to fix a blocking bug and Microsoft/hcsshim#749, to let me build without gcc. For hcshim, had I not been instrumenting the source, I could have used the nightly binary build of the containerd shim, and I'm planning to suggest/submit that their releases include pushing a container for the container managed /opt feature, which would avoid hunting down binaries and adding them to the $PATH. (Edit: Microsoft/hcsshim#750)

TBBle commented 4 years ago

The failure I hit in my previous run turned out to be a bug in hcsshim, for which I have posted a fix at microsoft/hcsshim#752.

So now I am able to build a trivial Dockerfile. So trivial it's pointless, except that it worked.

FROM mcr.microsoft.com/windows/servercore:1909
LABEL Description="Built with BuildKit!"
SHELL ["powershell", "-command"]
RUN echo Write-Host -ForegroundColor Red Hello > wr.ps1
CMD ["powershell" ".\wr1.ps1"]

I don't know yet if my containers do not have networking set up properly due to my Buildkit spec-generation hacks, or some other aspect of my setup unrelated to Buildkit.

As well as networking issues, filesystem commands do not function on Windows due to an assertion about idmapping support.

I was worried about API issues, so I had vendored containerd master into buildkit, and hcsshim master into containerd. However, I suspect that this wasn't necessary, and I'll back those out next time I look at this.

I've rebased https://github.com/TBBle/buildkit/tree/hacks_ahoy to the current version of #1314, so it should be relatively easy for anyone who wants to try this out, and perhaps try and turn some of my hacks into further valuable commits.

guillaume86 commented 4 years ago

@TBBle cool to see someone tackling this. Does your fork handles the alternative <pathOfDockerfile>.dockerignore path for .dockerignore files? That is pretty much the only thing I miss for the moment.

TBBle commented 4 years ago

It probably doesn't, but only because all the file-copy APIs in BuildKit fail an assertion on Windows related to permissions support.

I really should get back to this, it got jammed up behind questions about containerd 1.2 support, and then other stuff came up.

jorgearteiro commented 4 years ago

There is an issue logged on Microsoft Windows Containers repo https://github.com/microsoft/Windows-Containers/issues/34

TBBle commented 4 years ago

Now I'm looking at this again, I realise I previously only tested building into the buildkit cache.

Outputting also does work:

TBBle commented 4 years ago

I got image, oci, and docker outputs working in containerd in https://github.com/containerd/containerd/pull/4399, so I can now run the (trivial) images I build. So then back to working out how to do non-trivial things in the build script, next week. With a bit of luck I'm now free of any further containerd issues or unimplemented features.

FROM mcr.microsoft.com/windows/servercore:2004
LABEL Description="Built with BuildKit!"
SHELL ["powershell", "-command"]
ENTRYPOINT ["powershell"]
RUN echo "Write-Host -ForegroundColor DarkGreen Hello World" > C:/wr.ps1
CMD ["-command", "C:/wr.ps1"]
buildctl build --frontend dockerfile.v0 --local context=. --local dockerfile=. --output type=image,name=supersimpledocker,oci-mediatypes=true
ctr --namespace buildkit run --rm --tty supersimpledocker tm1
TBBle commented 4 years ago

Small progress report. I now have networking functional for the containerd worker under Windows. It's a minor hassle to set up using BuildKit and containerd directly (as you have to source and configure a CNI plugin yourself, and the Windows CNI landscape is... rough), but Docker provides its own managed network stack to use with BuildKit, so once someone implements the Docker side of the Buildkit integration, it won't be any more hassle than networking under any other setup.

No containerd changes this time, as containerd happily uses whatever CNI setup you pass it.

I now have the below functioning, see #1585 for details.

FROM mcr.microsoft.com/windows/servercore:2004

LABEL Description="Python" Vendor="Python Software Foundation" Version="3.7.3"

RUN powershell.exe -Command \
    $ErrorActionPreference = 'Stop'; \
    [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12; \
    wget https://www.python.org/ftp/python/3.7.3/python-3.7.3.exe -OutFile c:\python-3.7.3.exe ; \
    Start-Process c:\python-3.7.3.exe -ArgumentList '/quiet InstallAllUsers=1 PrependPath=1' -Wait ; \
    Remove-Item c:\python-3.7.3.exe -Force
TBBle commented 4 years ago

It occurs to me... I'm only testing with the containerd backend. Is there any interest in the runc executor working (using runhcs)? I feel like there's a movement away from using runhcs, and I'm not totally sure that this would avoid the use of containerd anyway, as things like the layer differ go through it; I haven't looked at what the runc executor does in this case.

tonistiigi commented 4 years ago

@TBBle Ideally both would work like in Linux but one is not a requirement for the other. It seems to me that worker that doesn't depend on containerd would be even simpler to get working. We should still reuse as much containerd code as possible and avoid duplication. For the differ, this is what Linux side does as well - it still uses the containerd differ, just it uses the library directly that is vendored into buildkit instead of the grpc API to containerd daemon.

tonistiigi commented 4 years ago

@TBBle we should also probably prioritize getting some CI running. It is quite hard for all of the current maintainers to actually test any of these changes. It is fine if the current test suite almost doesn't pass. We can start with some basics like the example you had above. I'm not quite sure how well the CI workers support wcow. Eventually, we probably want to switch from travis to github actions but we have some build-cache logic that can't be very easily transferred so it will take time. If Github actions support what is needed for this we could initially do something special there for windows only.

TBBle commented 4 years ago

The main blocker (my last remaining hack) for bringing this up in CI is refactoring GenerateSpec to not add any Linux elements to the spec, as that triggers LCOW mode.

That's my next task anyway, since that's the last change in my "hacks_ahoy" branch. Once that's in-place, I plan to start trying out the various tests on CI and see which pass. There's still an unmeasured pile of work to make the in-build filesystem support work (I know it currently fails due to rejecting attempts to set permissions), but hopefully I can identify a subset of the tests that can pass.

TBBle commented 4 years ago

A problem for using the vendored containerd for client-side diffing in the runc executor is that the vendored containerd is 1.3, which doesn't support diffing windows-layers, as that code is only in a PR I have open against containerd master, and I'm hoping it'll land in time for containerd 1.4 to be branched, although the beta series has already started and I don't know how much risk containerd will wear between betas.

I see BuildKit has a filesystem-only differ for windows-layers used on non-Windows platforms; I'm not sure whether it is a viable alternative to the hcs-based tar streaming used on Windows in the meantime, as I haven't looked closely at what differences it might have, c.f. https://github.com/containerd/containerd/pull/4399#issuecomment-660283335

tonistiigi commented 4 years ago

@TBBle The vendored containerd does not need to be stable release. We mostly vendor master to get the latest fixes. For the differ, I doubt the current windows-layers thing is usable. It is just for handling the different tar format(windows has a parent Hives/Files directories). Opened an issue to support it natively in https://github.com/containerd/containerd/issues/2469 as well so we don't need a hack. It would be nice if we could do the opposite as well(build Linux layers in windows) but that is not a priority atm of course.

TBBle commented 4 years ago

Whoops, turns out I'm not as close as I hoped. I just bounced off another unticked checkbox on https://github.com/containerd/containerd/issues/1920: "Commit uses archive diff to turn writable layer into a read only parent layer"

This is the reason I cannot have more than one RUN command in my Dockerfile, as Windows cannot mount the scratch layer as a parent layer. I see that Docker Engine always does this in its own layer, so I'll see about implementing this in containerd too.

TBBle commented 4 years ago

I've submitted a PR for the last containerd blocker (although that's what I thought yesterday), but it's a pretty tall stack for review, so I don't expect them to land soon.

So given my hacks_ahoy branch of buildkit, and the CNI setup I mentioned earlier, the following two Dockerfiles both work:

FROM mcr.microsoft.com/windows/servercore:2004

LABEL Description="Built with BuildKit!"

#WORKDIR C:/

SHELL ["powershell", "-command"]

ENTRYPOINT ["powershell"]

RUN echo "Write-Host -ForegroundColor DarkGreen Hello World" > C:/wr.ps1

RUN echo "Write-Host -ForegroundColor DarkBlue Hello World" > C:/wrblue.ps1

CMD ["-command", "C:/wr.ps1"]
# https://github.com/MicrosoftDocs/Virtualization-Documentation/blob/master/windows-container-samples/python/Dockerfile

# This dockerfile utilizes components licensed by their respective owners/authors.
# Prior to utilizing this file or resulting images please review the respective licenses at: https://docs.python.org/3/license.html

FROM mcr.microsoft.com/windows/servercore:2004

LABEL Description="Python" Vendor="Python Software Foundation" Version="3.8.5"

#WORKDIR C:/

SHELL ["powershell", "-command"]

RUN powershell.exe -Command \
    $ErrorActionPreference = 'Stop'; \
    [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12; \
    wget https://www.python.org/ftp/python/3.8.5/python-3.8.5.exe -OutFile c:\python-3.8.5.exe ; \
    Start-Process c:\python-3.8.5.exe -ArgumentList '/quiet InstallAllUsers=1 PrependPath=1' -Wait ; \
    Remove-Item c:\python-3.8.5.exe -Force

RUN New-Item -Path C:/hello.py -ItemType File -Value "print('Hello World!')"

CMD ["python", "c:/hello.py"]

Unless I hit more issues, that means the remaining work lies only in the solver, which should be much faster to iterate on.

TBBle commented 4 years ago

One thing I noticed is that the change to make mounting be done in-place on WIndows (contrast with non-Windows) is incorrect for actually mounting windows layers (which is why the 'tar' and 'local' output formats are wrong), but happens to work when the target is not a windows layer, but some random folder (which is what happens for --local context and --local dockerfile).

Resolving mounting windows layers is a containerd issue which I'm hoping to avoid working on.

I suspect the random folder mounting works by accident, because we're passing 'windows-layer' for the mount type, but we really want a bind-mount (junction point) or symlink. I'm not sure if that's a BuildKit issue or containerd issue at this time.

Edit: Dagnabbit. I should have realised... COPY, WORKDIR, etc all require local mounts of windows-layers so they can copy things in. So it's back to containerd after all. At least the solver had some low-hanging fruit for me (#1588).

TBBle commented 4 years ago

So, it seems the framework is now in a pretty-good state, if-and-when my PR series lands in containerd.

Now I'm hitting Windows-path-related issues in the Docker and LLB layers, and I really need some advice.

As a practical example,

COPY python-3.8.5.exe python-3.8.5.exe

ends up failing with

error: failed to solve: rpc error: code = Unknown desc = CreateFile C:\Users\paulh\AppData\Local\Temp\buildkit-mount019156354\C:: The filename, directory name, or volume label syntax is incorrect.

where C:\Users\paulh\AppData\Local\Temp\buildkit-mount019156354 is the mount-point at which the WIP layer is mounted.

Adding debugging, I can see in solver/llbsolver/file/backend.go docopy a call being made (the second copy.Copy call)

if err := copy.Copy(ctx, src, s, dest, destPath, opt...)

with

It is 100% correct that I want to copy C:\ProgramData\containerd\root\io.containerd.snapshotter.v1.windows\snapshots\903\python-3.8.5.exe to C:\Users\paulh\AppData\Local\Temp\buildkit-mount019156354\python-3.8.5.exe at that moment.

To get this far, I had already hacked on mkDir in the same file, because I'd been seeing weird results from fs.RootPath if it's given backslash-separated paths. I'd also hacked on frontend/dockerfile/instructions/parse.go to pass copy/add destinations and workdir paths through system.CheckSystemDriveAndRemoveDriveLetter and filePath.ToSlash. That's why the s and destPath values above don't start with C:.

So my overall question here, is at what point does the system expect to see only /-separated paths, and how deep should it be able to deal with either \-separated paths, or paths with drive-letters.

I don't know much about the solver, so I am probably not the best person to advance this, and I'm also nearly out of the time I have available to work deeply on this. This also seems like something that we could probably test with the testing framework, without needing a live system behind it.

So I plan to move next to cleaning up my other hacks (OCI schema generation, and undo my faulty mount-point hackery from much earlier) so that BuildKit's containerd worker is in a good state for whenever my PRs (or alternative implementations) land in containerd.

TBBle commented 4 years ago

I had a quick look at the Travis CI Windows documentation. It's Windows Server 2019, with Admin privileges, and people have been running Docker on it, with mixed results.

On the other hand, I'm not really sure how to bootstrap this, since it seems all the build pipelines rely on having a functioning BuildKit already in place. I guess I can cross-compile everything on Linux, and then just run the tests on Windows, but it's not obvious how I would get the cross-compile results out of the Linux builder into a Windows builder.

I also know nothing about GitHub Actions, so I'm not much help there. -_- Given the depth of my containerd PR queue, I'm not expecting to have this in a state where BuildKit itself is the main blocker for a while, either.

tonistiigi commented 4 years ago

I guess the best place to look for CI sample code is https://github.com/containerd/containerd/blob/master/.github/workflows/ci.yml#L253 . Building on Linux and testing on Windows would be nice but I think the ci providers don't have a very good way to do synchronization of different workers.

tonistiigi commented 4 years ago

So my overall question here, is at what point does the system expect to see only /-separated paths, and how deep should it be able to deal with either -separated paths, or paths with drive-letters.

Ideally everything in LLB/context send/checksums uses only unix paths. Note that these are not absolute (for host). Once we actually start to call syscalls with these files and access them on disk then the conversion can happen for the local OS. I don't remember from memory what kind of paths are in tarballs on windows layers but I'd also put only unix paths there.

TBBle commented 4 years ago

Okay, looking at the that GitHub workflow file from containerd, I can definitely hook up something like that, but it wouldn't be running in containers or using the master BuildKit Dockerfile as the rest of the flows are, since we still have the bootstrapping problem of the master Dockerfile requiring BuildKit.

tonistiigi commented 4 years ago

@TBBle Yes, this is expected. Hopefully, over time we can move more Windows things into containers and combine with Linux ci logic.

TBBle commented 4 years ago

WIndows Layers tarballs are all normal tar paths (/), but have a leading Files/ due to the way windows layers are stored on-disk. That part is taken care of by containerd anyway, so what we see in a mounted filesystem is a normal Windows filesystem, just mounted at a directory instead of rooted on a drive.

Sounds like there's two places that need fixing for the path issues:

I have a suspicion, that I haven't confirmed, that part of the problem is the use of filepath.Join('/', child) in the input to fs.RootPath, as I suspect it's falling over the "Don't create UNC paths" logic in filepath.Join, and hence not doing what we'd expect on Windows. I'm not sure why that is being done, unless it's to turn "" into "/".

It's possible that what we want is root paths in filepath.ToSlash format, and then right before using them with a filesystem API, call filepath.FromSlash. We could also just keep them in ToSlash format for the filesystem APIs, since Windows will handle that, but the error messages become less-natural.

Anyway, for this part of the issue, I'm hoping to work through it with unit tests rather than continuing to hack away at the code. Or that someone else with a clearer idea of how this part works will pick it up before I do. ^_^

TBBle commented 4 years ago

Future TODO: Once https://github.com/microsoft/Windows-Containers/issues/31 is resolved, we can override C:\windows\system32\drivers\etc\hosts on Windows like we do on Linux.

blackliner commented 3 years ago

What is the TL&DR on Buildkit with Windows containers?

EDIT: nvm, I guess the current state is "its being worked on": https://github.com/microsoft/Windows-Containers/issues/34

TBBle commented 3 years ago

More accurately (per my comment on that ticket) it's awaiting containerd maintainers for the low-level parts, and also more time spent on the BuildKit code itself to fix local filesystem access issues, see my comment above and #1621.

I don't think anyone is actively working on this right now. I haven't really made the time to come back to this since July, and I'm unaware of anyone else trying to advance the work on the the BuildKit code part.

blackliner commented 3 years ago

How about making it a gradual migration, start with the parallel download of layers, and hand over to "legacy" docker build after that?

TBBle commented 3 years ago

I'm not sure what you're suggesting here. The part "legacy docker build" is the only thing BuildKit does.

blackliner commented 3 years ago

Ah ok, in my mind the part of parallel downloading the layers from a repo was its own "software component". I am not into the detail design of buildkit, and if its all woven into each other, then yes it might not be possible to split things up.

Another idea, that first part, the downloading of the layers, is there a way to call only this part of buildkit explicitly? For example like this: DOCKER_BUILDKIT=1 docker pull my:image

TBBle commented 3 years ago

To the best of my knowledge, BuildKit under Docker doesn't do any pulling of layers. In the case of DOCKER_BUILDKIT=1 docker pull my:image, it's still Docker doing the pulling, identically to DOCKER_BUILDKIT=0 docker pull my:image.

BuildKit is only involved in docker build or docker buildx, as far a I know. That in turn might require pulling for FROM or COPY --from stages, but I suspect that either when run from Docker, or when run with the containerd-worker backend via buildctl, it delegates pulling to Docker or containerd respectively anyway.

It's possible I'm wrong and BuildKit uses its own layer-pull implementation even when running under Docker, but that wouldn't make sense as it would have to duplicate a lot of infrastructure to integrate neatly into Docker or containerd.

Docker already does parallel downloading of layers from a registry, anyway. So I'm not sure why you consider it a feature of BuildKit?

blackliner commented 3 years ago

Docker already does parallel downloading of layers from a registry, anyway. So I'm not sure why you consider it a feature of BuildKit?

Downloading yes, but also extraction and checksum?

TBBle commented 3 years ago

I don't think you can parallel-extract layers in an image, because each layer is a set of changes against the layer below, and for example, if a higher layer processes a 'delete' for a particular file or directory, the extractors of the lower level must then track that this happened and skip those directories.

At least for Windows Container Layers. It's possible on Linux that the layer storage is done by some other mechanism that would let you extract layers in parallel, but I'm not that familiar with the Linux container storage implementation. (I don't see such a code-path in the downloader though.)

I'm not sure if we do checksumming during extraction? Checksumming should happen when the layer is pulled and written to the content store, and that would also be in parallel as the data being downloaded will be piped through the checksummer in parallel to being written to disk, I expect.

Edit: I checked this, the digest is of the uncompressed data, so that's done during extraction after all. So to parallelise checksumming, we'd have to separately decompress-and-checksum each layer so it could be done in parallel, which seems like a waste of CPU and IO bandwidth when right now we get checksumming for free as part of extracting the layer anway.

blackliner commented 3 years ago

The reason I am asking that stuff is that I was using Linux containers for 2 years and got used to the snappy performance, awesome startup times etc. A few days ago I had to port our Windows CI to a Container, and hell that was a bad experience! Everything takes about 2-10 times longer than with Linux Containers. Yes you can tune things, but even if you do all you can find online (deactivate AV :-/ use isolation=process) it is still a worse experience compared.