microsoft / navcontainerhelper

Official Microsoft repository for BcContainerHelper, a PowerShell module, which makes it easier to work with Business Central Containers on Docker.
MIT License
381 stars 243 forks source link

Containerized CI/CD Agents with BCContainerHelper #2908

Open jefkeers opened 1 year ago

jefkeers commented 1 year ago

Based on this webinar: https://www.youtube.com/watch?v=0INOdEhXd38&t=3010s I started to create a containerized build agent;
Through this agent a new build container is created correctly, but at the end he is failing to connect to it. Then we get an error about winrm. When i try to connect manually from my container agent to my container that has just been build, it seems to work. Also i can find some code in bccontainerhelper "IsInsideContainer", so I presume it should work.

What i've done: Created an image for the agent:docker build -t agentimage -f Dockerfile.bcagent --build-arg BASE=ltsc2019 --build-arg AZP_URL=https://dev.azure.com/xxxx --build-arg AZP_TOKEN=xxx .

And then started this container based on the image created above: docker run -d --name BC15Agent3 --restart always -v "C:\bcartifacts.cache:C:\bcartifacts.cache" -v \.\pipe\docker_engine:\.\pipe\docker_engine -v c:\azp\agent_work:c:\azp\agent_work -v C:\ProgramData\BcContainerHelper\:C:\ProgramData\BcContainerHelper\ --env AZP_AGENT_NAME=BC15Agent3 --env AZP_POOL=BC15 --env AZP_URL=https://dev.azure.com/xxxx --env AZP_TOKEN=xxx agentimage

the dockerfile.bcagent comes from github repository of Tobias Fenster: https://github.com/cosmoconsult/azdevops-build-agent-image

The weird thing is that the first time after my VM has been started, and also the container of the build agent is fresh, he can get connection and goes through the new-container step (first attachment) FirstSucces.txt

If i retry again, same build agent, same name of container, connection does not succeed. (second [attachment)] SecondFail.txt

If you need any more info or logs, i'll be happy to deliver that

Thanks in advance Jef

freddydk commented 1 year ago

Sorry, but I have no insight to the webinar or the mechanisms used here. Maybe @tfenster can see what you have done right/wrong?

The failing part looks like it cannot create a session to the container.

tfenster commented 1 year ago

Very far in the back of my mind, I vaguely remember something about weird issues if the container name was too long and I see WARNING: Container name should not exceed 15 characters in your log. Can you try with a shorter name? Maybe this https://github.com/microsoft/navcontainerhelper/blob/master/BC.HelperFunctions.ps1#L133-L134 doesn't work as expected in this case

jefkeers commented 1 year ago

Hi, Tried with a shorter name: Creating Container bc15agent3-l-ci but no success. It seems after a restart, he gets another ipadress, can that be the problem?

tfenster commented 1 year ago

Good thought. I can also see this

2023-02-09T07:22:03.5240289Z Adding BC15AGENT3-LABE to hosts file

on your log file. How does that look for the shorter name? Does that look correctly?

jefkeers commented 1 year ago

Hi, i can't find that specific line in the log. So i will attach the whole log right now, maybe you can find something fishy :-)

FailBCAgent.txt

tfenster commented 1 year ago

@jefkeers Now it becomes truly interesting, because I can't see the entry in your latest log as well. And I even don't understand where it came from initially as I can't find the "Adding ..." code anywhere in bccontainerhelper. But I'll try to tackle it from a different angle: Can you share what exactly is running in your pipeline? From the logs, it looks like you are running something like

DevOps-Pipeline.ps1 -version "ci" -appBuild 2147483647 -appRevision 0 -Repo Labellov

Can you share the content of that script? With that, I could try to repro

jefkeers commented 1 year ago

Hi @tfenster , i sended you this info through mail. thx

tfenster commented 1 year ago

@jefkeers Maybe others can benefit as well, so I'll continue here, if you don't mind. Given your scripts, I tried to narrow it down to the smallest possible repro that I could imagine while keeping it fairly close to your scripts. Here is what I did:

  1. Build the image with

    git clone https://github.com/cosmoconsult/azdevops-build-agent-image
    cd azdevops-build-agent-image
    # changed the bcch_version to 4.0.14, more on this below
    docker build -t agentimage -f Dockerfile.bcagent --build-arg BASE=ltsc2019 --isolation hyperv --build-arg AZP_URL=https://dev.azure.com/repro-cicd-issue/ --build-arg AZP_TOKEN=... .

    Note that I used hyperv iso as I am running on a Windows Server 2022

  2. Create a repo with "AL: Go!" and push to https://dev.azure.com/repro-cicd-issue/_git/project

  3. Run the agent container with

    docker run -e AZP_URL=https://dev.azure.com/repro-cicd-issue/ -e AZP_TOKEN=... --isolation hyperv -v \\.\pipe\docker_engine:\\.\pipe\docker_engine -v C:\ProgramData\BcContainerHelper:C:\ProgramData\BcContainerHelper\ -v c:\repro:c:\repro -e AZP_AGENT_NAME=repro -e AZP_POOL=Default agentimage

    Note that I am again using hyperv iso. Also note that I am sharing c:\repro, but not c:\azp which I guess might be an issue, see my ideas list in the end

  4. Get a new session in the container to trigger the pipeline script, but not from within a pipeline. I didn't want to go through the whole setup of generating the pipeline, so I did the next best thing by setting up a couple of things and then just triggering the script:

    
    choco install -y vim git
    cd ..\repro
    remove-item -Force -Recurse *
    vim settings.json
    & 'C:\Program Files\Git\git-cmd.exe'
    git clone https://repro-cicd-issue@dev.azure.com/repro-cicd-issue/project/_git/project
    exit

https://gist.github.com/sheldonhull/dbbc8356028264047fd742b56c5ee27e

$json = Get-Content -Raw -Path ([io.path]::Combine($Path,"settings.json")) -force | ConvertFrom-Json [string[]]$variables = ($json | get-member -Name * -MemberType NoteProperty).Name foreach ($v in $variables) { Set-Variable -name $v -value ($json.$v) -Force -Verbose }

$parameters = @{ "accept_eula" = $true "containerName" = $containerName "imageName" = $imageName "artifactUrl" = $artifactUrl "credential" = $credential "auth" = 'Windows' "isolation" = "hyperv" "updateHosts" = $true "licenseFile" = $licenseFile "enableTaskScheduler" = $enableTaskScheduler "memoryLimit" = "10G" "SendExtendedTelemetryToMicrosoft" = $false "includeTestToolkit" = $true "doNotCheckHealth" = $true }

$params = @{} Import-Module BcContainerHelper $artifact = "/OnPrem/21.0/be" $baseFolder = "c:\repro" $containername = "container" $bcContainerHelperConfig.UsePsSession = $false

Run-AlPipeline @params -NewBcContainer { param([hashtable]$parameters) New-BcContainer @parameters -isolation hyperv Invoke-Command -ScriptBlock {get-psdrive} } -pipelinename $pipelineName -containerName $containerName -imageName $imageName -bcAuthContext $authContext -environment $environmentName -artifact $artifact.replace('{INSIDERSASTOKEN}',$insiderSasToken) -memoryLimit $memoryLimit -baseFolder $baseFolder -licenseFile $LicenseFile -installApps $installApps -installTestApps $installTestApps -previousApps $previousApps -appFolders $appFolders -testFolders $testFolders -doNotRunTests:$doNotRunTests -testResultsFile $testResultsFile -testResultsFormat 'JUnit' -installTestRunner:$installTestRunner -installTestFramework:$installTestFramework -installTestLibraries:$installTestLibraries -installPerformanceToolkit:$installPerformanceToolkit -enableCodeCop:$enableCodeCop -enableAppSourceCop:$enableAppSourceCop -enablePerTenantExtensionCop:$enablePerTenantExtensionCop -enableUICop:$enableUICop -azureDevOps:($environment -eq 'AzureDevOps') -gitLab:($environment -eq 'GitLab') -gitHubActions:($environment -eq 'GitHubActions') -failOn 'error' -AppSourceCopMandatoryAffixes $appSourceCopMandatoryAffixes -AppSourceCopSupportedCountries $appSourceCopSupportedCountries -additionalCountries $additionalCountries -buildArtifactFolder $buildArtifactFolder -CreateRuntimePackages:$CreateRuntimePackages -appBuild $appBuild -appRevision $appRevision


The settings.json looks like this, almost identical to yours

{ "name": "L", "memoryLimit": "10G", "installApps": "", "installDLL": "", "installTestApps": "", "previousApps": "", "appFolders": "project", "testFolders": "", "installTestRunner": false, "installTestFramework": false, "installTestLibraries": false, "installPerformanceToolkit": false, "doNotSignApps": true, "enableCodeCop": true, "enableAppSourceCop": false, "enablePerTenantExtensionCop": false, "enableUICop": true, "bcContainerHelperVersion": "preview", "additionalCountries": "", "vaultNameForLocal": "BuildVariables", "versions": [ { "version": "ci", "artifact": "/OnPrem/21.0/be", "cacheImage": true, "CreateRuntimePackages": true }, { "version": "current", "artifact": "///us/Current", "CreateRuntimePackages": true }, { "version": "cloud", "artifact": "///us/Current" }, { "version": "nextmajor", "artifact": "///be/NextMajor/{INSIDERSASTOKEN}" }, { "version": "nextminor", "artifact": "///be/NextMinor/{INSIDERSASTOKEN}" } ] }


With that in place, I could run the script not fully successfully, but it got past the point where your setup failed: [1st repro.txt](https://github.com/microsoft/navcontainerhelper/files/10834839/1st.repro.txt) and [2nd repro.txt](https://github.com/microsoft/navcontainerhelper/files/10834841/2nd.repro.txt). Then I realized, that the `imageName` param wasn't set properly, so it always created a new image. I fixed this with

$imageName = "myimage"



and ran the script again. The third try now built the image as expected and the "pipeline run" had the same result ([3rd repro.txt](https://github.com/microsoft/navcontainerhelper/files/10834873/3rd.repro.txt)). Then I ran it a fourth time, now the already existing image was used and I ran into the exact same error as you ([4th repro.txt](https://github.com/microsoft/navcontainerhelper/files/10834888/4th.repro.txt)). So it seems to me like it works when generating the image, but not when just creating the container. Extremely weird, I have to do some more digging... Maybe you could for a test remove the image name to validate that it indeed works if the image is generated during the run?

Looking at your scripts and the log output a bit more, I also found the following things. It seem like they are not the problem we are trying to find, but maybe for the future:
1. As I noted, I didn't share c:\azp from the host to the container because a) c:\azp should be set up from the install within the container and b) the content of this directory might be changed by another pipeline running on the host (or in another container sharing the same folder), which could cause issues.
2. You seem to do a bcch update in your pipeline. That isn't the idea of the container image, you should instead change the `BCCH_VERSION` here https://github.com/cosmoconsult/azdevops-build-agent-image/blob/master/Dockerfile.bcagent#L5 as I did by setting it to 4.0.14
3. You are using a quite old Docker version, could be worth updating as well. Please note however that updating Docker on Windows Server 2019 has become quite a bit better than on 2016, but still sometimes has "surprising" results, so please do that on a machine where you can live with a break
4. I did all this on Server 2022 which is a lot better with Docker, so that could also be worth a try

I'll try to find some time for more investigation, but most likely I won't be able to do that next week.

@freddydk Short version: Run-AlPipeline seems to fail if the DevOps agent itself is in a container and the corresponding BC image already exists. If it is generated as part of Run-AlPipeline, I can successfully run it multiple times, but if it already exists, the pipeline run fails with a WinRM error. Would that trigger any thoughts what might cause this? If you want a repro without having to set up an Azure DevOps pipeline, see above
jefkeers commented 1 year ago

Hi @tfenster thank you for investigating! i tried to set my bc to 4.0.14, but that didn't change anything. Indeed the first time when he has to create the image, the procedure goes a bit further than the second time. But in both cases it fails when trying to connect to the container (in the script, or is it a healthcheck in the script)

i will setup a newer windows server with a new docker engine and try that, hopefully this week. Will let you know the results

tfenster commented 1 year ago

@jefkeers I think the run that you shared in "FirstSuccess" fails because the LS Retail extension is missing.

2023-02-09T07:19:19.8825288Z Processing dependency LS Retail_LS Central_15.0.0.0 (5ecfc871-5d82-43f1-9c54-59685e82318d)
2023-02-09T07:19:19.8840136Z Downloading symbols: LS Retail_LS Central_15.0.0.0.app
2023-02-09T07:19:19.8853368Z Url : http://192.168.165.205:7049/BC/dev/packages?publisher=LS%20Retail&appName=LS%20Central&versionText=15.0.0.0&tenant=default
2023-02-09T07:19:19.8874890Z Using WebClient
2023-02-09T07:19:20.0476197Z ERROR Exception calling "DownloadFile" with "2" argument(s): "The remote server returned an error: (404) Not Found."
...
2023-02-09T07:19:24.5535705Z ##[error]Exception calling "DownloadFile" with "2" argument(s): "The remote server returned an error: (404) Not Found."
Not Found
No published package matches the provided arguments.
At C:\Program Files\WindowsPowerShell\Modules\BcContainerHelper\4.0.14\AppHandling\Compile-AppInNavContainer.ps1:466 
char:25
+                         throw (GetExtendedErrorMessage $_)
+                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : OperationStopped: (Exception calli...ided arguments.:String) [], RuntimeException
    + FullyQualifiedErrorId : Exception calling "DownloadFile" with "2" argument(s): "The remote server returned an er 
   ror: (404) Not Found."
Not Found
No published package matches the provided arguments.
2023-02-09T07:19:24.5580163Z ##[error]PowerShell exited with code '1'.

If I am right, that probably wouldn't be related to any container issue

jefkeers commented 1 year ago

Yes, the downloadfile-error is not an issue; I just wanted to show that the first time the pipeline could go further than the second time. What i think is really strange. But your remark about the image can be true in that case.

freddydk commented 1 year ago

Short version: Run-AlPipeline seems to fail if the DevOps agent itself is in a container and the corresponding BC image already exists. If it is generated as part of Run-AlPipeline, I can successfully run it multiple times, but if it already exists, the pipeline run fails with a WinRM error. Would that trigger any thoughts what might cause this? If you want a repro without having to set up an Azure DevOps pipeline, see above

I will try to repro this, sounds like a bug