Closed mch2 closed 1 year ago
Hi, @mch2 , could you let me know how do yor trigger the gradle_check in this case? if you had a PR that triggered it, can you send me the PR link? thanks,
CC @peterzhuamazon
I will take care of this as I have talked to @mch2 offline. Thanks.
Able to get docker running on Windows with hyperv.
Administrator@<> MINGW64 ~
$ docker version
Client:
Version: 23.0.6
API version: 1.42
Go version: go1.19.9
Git commit: ef23cbc
Built: Fri May 5 21:18:35 2023
OS/Arch: windows/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 23.0.6
API version: 1.42 (minimum version 1.24)
Go version: go1.19.9
Git commit: 9dbdbd4
Built: Fri May 5 21:17:32 2023
OS/Arch: windows/amd64
Experimental: false
Administrator@<> MINGW64 ~
$ docker pull mcr.microsoft.com/windows/nanoserver:ltsc2019
ltsc2019: Pulling from windows/nanoserver
aaaa081173ae: Pulling fs layer
aaaa081173ae: Verifying Checksum
aaaa081173ae: Download complete
aaaa081173ae: Pull complete
Digest: sha256:fb78bd84ac937f6b1453e19015ccce41636bbeca5fe5bc6dc5c7d55adb4a2bc5
Status: Downloaded newer image for mcr.microsoft.com/windows/nanoserver:ltsc2019
mcr.microsoft.com/windows/nanoserver:ltsc2019
Needs @mch2 to confirm what are the exact images that windows docker is running with.
On windows, if you use hyperv then windows host can only run windows container. If we need windows host to run linux container, we need to enable wsl2 later on and might have issues.
Please let me know about this. Thanks.
Also, this can be a good start into these two issues to bring windows integTest with docker host and containers, even building the artifacts on windows docker containers.
Here is a chart showcasing the comparison between different offers of containers on Windows:
Here's a chart comparing some of the key differences between Windows Server with Server Core installation and Windows Nano Server:
Feature | Windows Server with Server Core | Windows Nano Server |
---|---|---|
Installation size | Larger (several GBs) | Smaller (a few hundred MBs) |
Attack surface | Larger | Smaller |
Support for GUI | Yes (minimal) | No |
Support for 32-bit applications | Yes | No |
Support for Windows Services | Yes | Limited |
Support for .NET Framework | Yes | Limited |
Support for Containers | Yes | Yes |
Licensing | Standard, Datacenter | Standard, Datacenter |
Available editions | All Windows Server editions | Standard and Datacenter only |
Will try to see if we can bring nanoserver in place to make Windows light wight in build, test, and check.
Thanks.
I eventually get the docker container running the nanoserver on Windows:
PS C:\Users\Administrator> docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
mcr.microsoft.com/windows/nanoserver ltsc2019 82ef3885248c 2 weeks ago 252MB
PS C:\Users\Administrator> docker run 82ef3885248c
Microsoft Windows [Version 10.0.17763.4645]
(c) 2018 Microsoft Corporation. All rights reserved.
C:\>
PS C:\Users\Administrator> docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4aced3bb72dd 82ef3885248c "c:\\windows\\system32…" About a minute ago Exited (0) About a minute ago blissful_liskov
PS C:\Users\Administrator> docker rm 4aced3bb72dd
4aced3bb72dd
PRs:
https://github.com/opensearch-project/opensearch-build/pull/3859
https://github.com/opensearch-project/opensearch-ci/pull/334
https://github.com/opensearch-project/opensearch-build/pull/3903
https://github.com/opensearch-project/opensearch-build/pull/3905
https://github.com/opensearch-project/opensearch-build/pull/3910
https://github.com/opensearch-project/opensearch-ci/pull/337
https://github.com/opensearch-project/opensearch-ci/pull/338
https://github.com/opensearch-project/opensearch-ci/pull/339
https://github.com/opensearch-project/opensearch-build/pull/3939
https://github.com/opensearch-project/opensearch-build/pull/3941
https://github.com/opensearch-project/opensearch-build/pull/3942
https://github.com/opensearch-project/opensearch-build/pull/3943
https://github.com/opensearch-project/opensearch-build/pull/3944
https://github.com/opensearch-project/opensearch-build/pull/3946
https://github.com/opensearch-project/opensearch-build/pull/3947
https://github.com/opensearch-project/opensearch-build/pull/3951
https://github.com/opensearch-project/opensearch-ci/pull/343
https://github.com/opensearch-project/opensearch-ci/pull/344
https://github.com/opensearch-project/opensearch-build/pull/3976
https://github.com/opensearch-project/opensearch-ci/pull/345
https://github.com/opensearch-project/opensearch-build-libraries/pull/308
https://github.com/opensearch-project/opensearch-build-libraries/pull/314
https://github.com/opensearch-project/opensearch-build/pull/4023
https://github.com/opensearch-project/opensearch-build/pull/4027
[x] Updating
We will be better of with the servercore option rather than the nanoserver, as the latter lack of several core components, while the servercore is just a headless version of the normal server base of Windows.
Issues in the windows docker that is currently not able to solve to make it the same as AMI: Move-Item : Access to the path is denied.
https://github.com/moby/moby/issues/38256 https://github.com/microsoft/Windows-Containers/issues/147
Just able to confirm that I am using --isolation=process
not --isolation=hyperv
.
Able to resolve the move issue by just using mingw and force the mv happens by bash.exe.
bash.exe -c "mv -v 'C:\\Windows\\System32\\find.exe' 'C:\\Windows\\System32\\find_windows.exe'"
renamed 'C:\Windows\System32\find.exe' -> 'C:\Windows\System32\find_windows.exe'
Seems like issue with volta on 1.1.1: https://github.com/volta-cli/volta/issues/1435
Will revert to either the older 1.0.8 or 1.1.0 now.
Thanks.
Able to invoke bash.exe directly in the windows container and able to run test workflow:
ContainerAdministrator@44082dfc4844 MINGW64 /c
$ whoami
ContainerAdministrator
New issues:
windows [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate
Tried many methods including install pip-system-certs, scoop install cacerts, install certifi, manually push mozilla ca certs to the certifi certs, export REQUESTS_CA_BUNDLE, etc.
Right now the only method that seems working is using curl to pull the zip once such as curl https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.9.0/8184/windows/x64/zip/dist/opensearch/opensearch-2.9.0-windows-x64.zip -o test.sh
so the cloudfront public cert is being added once to the certifi certs or system ca cert bundle, then the python requests package within the windows docker container will able to do ssl verification correctly.
Very weird and probably I missed something here. Thanks.
New way is supported to run correctly but still not, just curl ci.opensearch.org for now as it is stable:
ContainerAdministrator@0062c7841faa MINGW64 ~/opensearch-build-peterzhuamazon (windows-docker-setups-2)
$ openssl s_client -connect ci.opensearch.org:443 </dev/null | openssl x509 -outform PEM > certificate2.crt
depth=2 C = US, O = Amazon, CN = Amazon Root CA 1
verify return:1
depth=1 C = US, O = Amazon, CN = Amazon RSA 2048 M01
verify return:1
depth=0 CN = ci.opensearch.org
verify return:1
DONE
ContainerAdministrator@0062c7841faa MINGW64 ~/opensearch-build-peterzhuamazon (windows-docker-setups-2)
$ vi certificate2.crt
ContainerAdministrator@0062c7841faa MINGW64 ~/opensearch-build-peterzhuamazon (windows-docker-setups-2)
$ certutil -addstore CA certificate2.crt
CA "Intermediate Certification Authorities"
Certificate "ci.opensearch.org" added to store.
CertUtil: -addstore command completed successfully.
Seeing issues on the windows integTest with Zelin fix now suddenly: win-integtest-issues20230807.txt
Reproduced on new windows ec2 server: linux-windows-knn.log
Trying to push the image of windows docker to dockerhub:
Administrator@EC2AMAZ-6B9Q6PN MINGW64 ~/opensearch-build ((2.9.0))
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
opensearchstaging/ci-runner testwindowsagain2 86fe8dda8a9f 2 days ago 8.9GB
opensearchstaging/ci-runner windows2019-servercore-test1 86fe8dda8a9f 2 days ago 8.9GB
opensearchstaging/ci-runner testwindows2019-user 244f0de9c472 12 days ago 10.5GB
opensearchstaging/ci-runner testwindows2019 3c134658ffbb 13 days ago 10.6GB
mcr.microsoft.com/windows/servercore ltsc2019 67667e0b9c95 4 weeks ago 4.38GB
mcr.microsoft.com/windows/nanoserver ltsc2019 82ef3885248c 4 weeks ago 252MB
Administrator@EC2AMAZ-6B9Q6PN MINGW64 ~/opensearch-build ((2.9.0))
$ docker push opensearchstaging/ci-runner:windows2019-servercore-test1
The push refers to repository [docker.io/opensearchstaging/ci-runner]
10d36872fef9: Pushed
c7c5acd32d49: Pushed
33b0605bff63: Pushed
17a1ee0cab1d: Pushed
701fc89ba113: Pushed
d9b90de3477f: Pushing [=======> ] 720MB/4.51GB
bd22b31d5d10: Pushed
325c8c82006f: Pushed
84079ad09eb0: Pushed
da2d874340bd: Pushing [===================> ] 1.438GB/3.666GB
I have noticed that the error of windows issues with permission is related to running through build repo code, not when you trying to directly run it within k-NN repo.
Somehow the new windows AMI is creating two user folder: Administrator vs Administrator.EC2AMAZ<>.
The second user used to be only available when logging in through ssm or rdp, but not affecting the actual Administrator user content.
However, it is now split the installation on both account suddenly, and even if you login as Administrator on RDP it will default you to Administrator.EC2AMAZ<> user.
It is possible to be caused either by new AMI provided by EC2, or git bash(?), not sure. Testing building the old code now to confirm this behavior.
Seems like it is either caused by docker pkg, or hyperv, or bcedit setups. Testing one by one in building a new image for each now. Thanks.
The above issue all caused by this command:
dockerd --register-service
Seems like if I login as Administrator on rdp this will not harm. But if I run this through powershell script it will split the docker part into a secondary owner then move all the other things to the new owner and only keep dockerd itself in Administrator.
This can still be resolved by just running the registration during startup time of the runner.
Resolved by using init script and avoid embedding service registration in packer scripts.
echo %USERNAME% && START /MIN dockerd && timeout 5 && docker ps
The docker host and build of Windows Runner is up and running on staging Jenkins now:
Executing init script
C:\Users\Administrator>echo Administrator && dockerd --register-service && net start docker && echo "started docker deamon" && docker ps
Administrator
The Docker Engine service is starting.
The Docker Engine service was started successfully.
"started docker deamon"
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
init script ran successfully
remoting.jar sent remotely. Bootstrapping it
Launching via WinRM:java -jar C:\Windows\Temp\remoting.jar -workDir C:/Users/Administrator/jenkins
<===[JENKINS REMOTING CAPACITY]===>Remoting version: 3107.v665000b_51092
Launcher: EC2WindowsLauncher
Communication Protocol: Standard in/out
This is a Windows agent
I have confirmed as of docker copy the linux runner is able to copy over windows docker images across registries: https://build.ci.opensearch.org/job/docker-copy/664/console
Able to build windows container within windows container just like linux docker in docker:
"//./pipe/docker_engine://./pipe/docker_engine"
PS C:\Users\ContainerAdministrator\opensearch-build\docker\ci> bash
ContainerAdministrator@0d2f4f1f80e9 MINGW64 ~/opensearch-build/docker/ci (main)
$ ./build-image-single-arch.sh -r ci-runner -v windows2019-servercore-test2 -f dockerfiles/current/build.windows2019.ser
vercore.x64.dockerfile
windows2019-servercore-test2 dockerfiles/current/build.windows2019.servercore.x64.dockerfile
Sending build context to Docker daemon 116.7kB
Step 1/10 : ARG ServerCoreRepo=mcr.microsoft.com/windows/servercore
Step 2/10 : FROM ${ServerCoreRepo}:ltsc2019
---> 67667e0b9c95
Step 3/10 : USER ContainerAdministrator
---> Using cache
---> 24b3e060da38
Step 4/10 : COPY config/windows-servercore-setup.ps1 ./
---> 8ded082f13d6
Step 5/10 : RUN powershell ./windows-servercore-setup.ps1
---> Running in 31535c4f165e
Initializing...
Downloading ...
Extracting...
Creating shim...
Adding ~\scoop\shims to your path.
Scoop was installed successfully!
Startup time of the windows agent is now reduced from 15-17 min or so to 5-7 min, see new vs old in the log:
So Docker build is now compatible with windows docker build: https://hub.docker.com/layers/opensearchstaging/ci-runner/ci-runner-windows2019-servercore-opensearch-build-v1.1/images/sha256-b6ba005996340062f68137fe7cf3e17cd3d61bdb9a5df944f276905df795dd0e?context=explore
Runs good with docker build support on production now: https://build.ci.opensearch.org/job/docker-build/3629/console
Windows image extraction is very slow, needs to use pigz to increase the pull speed.
Windows does not have pigz installation on scoop, needs to install the binary directly.
Seems like MOBY_DISABLE_PIGZ
is used to disable unpigz behavior, so it should be enabled by default.
So pigz seems only runs if you put in root of C: with dir like C:\pigz
and put into machine env vars.
It seems like pigz only saves time when the extraction is happening, but the most time wasted is after the extraction.
It does not seems that pigz will help improve time that much:
Without pigz:
$ time docker pull opensearchstaging/ci-runner:ci-runner-windows2019-servercore-opensearch-build-v1
ci-runner-windows2019-servercore-opensearch-build-v1: Pulling from opensearchstaging/ci-runner
c9226d61d3bd: Already exists
b95f433aa7d9: Pull complete
00e36bb1af6a: Pull complete
96b3ca42606a: Pull complete
eba42434ce94: Pull complete
69c589335db3: Pull complete
0ec633f2f60c: Pull complete
21200ab93e1b: Pull complete
bc161862b081: Pull complete
c65a5ac1ea31: Pull complete
Digest: sha256:b6ba005996340062f68137fe7cf3e17cd3d61bdb9a5df944f276905df795dd0e
Status: Downloaded newer image for opensearchstaging/ci-runner:ci-runner-windows2019-servercore-opensearch-build-v1
docker.io/opensearchstaging/ci-runner:ci-runner-windows2019-servercore-opensearch-build-v1
real 12m39.993s
user 0m0.000s
sys 0m0.015s
With pigz:
$ time docker pull opensearchstaging/ci-runner:ci-runner-windows2019-servercore-opensearch-build-v1
ci-runner-windows2019-servercore-opensearch-build-v1: Pulling from opensearchstaging/ci-runner
c9226d61d3bd: Already exists
b95f433aa7d9: Pull complete
00e36bb1af6a: Pull complete
96b3ca42606a: Pull complete
eba42434ce94: Pull complete
69c589335db3: Pull complete
0ec633f2f60c: Pull complete
21200ab93e1b: Pull complete
bc161862b081: Pull complete
c65a5ac1ea31: Pull complete
Digest: sha256:b6ba005996340062f68137fe7cf3e17cd3d61bdb9a5df944f276905df795dd0e
Status: Downloaded newer image for opensearchstaging/ci-runner:ci-runner-windows2019-servercore-opensearch-build-v1
docker.io/opensearchstaging/ci-runner:ci-runner-windows2019-servercore-opensearch-build-v1
real 12m29.576s
user 0m0.015s
sys 0m0.000s
Saved 10 seconds.
Some more test:
Without pigz
$ time docker pull opensearchstaging/ci-runner:ci-runner-windows2019-servercore-opensearch-build-v1
ci-runner-windows2019-servercore-opensearch-build-v1: Pulling from opensearchstaging/ci-runner
c9226d61d3bd: Already exists
b95f433aa7d9: Pull complete
00e36bb1af6a: Pull complete
96b3ca42606a: Pull complete
eba42434ce94: Pull complete
69c589335db3: Pull complete
0ec633f2f60c: Pull complete
21200ab93e1b: Pull complete
bc161862b081: Pull complete
c65a5ac1ea31: Pull complete
Digest: sha256:b6ba005996340062f68137fe7cf3e17cd3d61bdb9a5df944f276905df795dd0e
Status: Downloaded newer image for opensearchstaging/ci-runner:ci-runner-windows2019-servercore-opensearch-build-v1
docker.io/opensearchstaging/ci-runner:ci-runner-windows2019-servercore-opensearch-build-v1
real 5m34.866s
user 0m0.000s
sys 0m0.015s
With pigz:
$ time docker pull opensearchstaging/ci-runner:ci-runner-windows2019-servercore-opensearch-build-v1
ci-runner-windows2019-servercore-opensearch-build-v1: Pulling from opensearchstaging/ci-runner
c9226d61d3bd: Already exists
b95f433aa7d9: Pull complete
00e36bb1af6a: Pull complete
96b3ca42606a: Pull complete
eba42434ce94: Pull complete
69c589335db3: Pull complete
0ec633f2f60c: Pull complete
21200ab93e1b: Pull complete
bc161862b081: Pull complete
c65a5ac1ea31: Pull complete
Digest: sha256:b6ba005996340062f68137fe7cf3e17cd3d61bdb9a5df944f276905df795dd0e
Status: Downloaded newer image for opensearchstaging/ci-runner:ci-runner-windows2019-servercore-opensearch-build-v1
docker.io/opensearchstaging/ci-runner:ci-runner-windows2019-servercore-opensearch-build-v1
real 4m45.069s
user 0m0.000s
sys 0m0.016s
git clone now on the windows host is instant on build repo.
There is a bug right now that every time when we pull the image from fresh it will always fail once on the sh stage. I suspect we need to pre-load the image on the runner beforehand. It will goes to success soon after in the second rerun:
ERROR: script returned exit code 127
Add a docker image initialization step on Windows Docker Host to resolve above issues.
Add new integTest support with Windows container now.
Per opensearch-project/opensearch-build#3816 we have fixed the docker commands issues on Windows, but it only supports hyperv running windows on windows through docker.
Per discussion with @mch2 the core team needs to disable the linux container related test on Windows.
Thanks.
Describe the bug
Windows CI builds are failing, example: https://build.ci.opensearch.org/job/gradle-check/14914/console
To reproduce
N/A
Expected behavior
Builds should pass and docker tests should run.
Screenshots
If applicable, add screenshots to help explain your problem.
Host / Environment
No response
Additional context
No response
Relevant log output
No response