Closed kine closed 3 years ago
Additionally - it fails even when I am using our own certificate using this script SetupCertificate.ps1>
$certPfxFile = Join-Path $PSScriptRoot (Split-Path $env:CertFile -Leaf)
$certPfxPassword = $env:CertPwd
$dnsidentity = $env:DnsIdentity
$publicDnsname = "${hostname}.${dnsidentity}"
$cert = New-Object System.Security.Cryptography.X509Certificates.X509Certificate2 -ArgumentList $certPfxFile,$certPfxPassword
$certificateThumbprint = $cert.Thumbprint
Write-Host "Certificate File Thumbprint $certificateThumbprint"
if (!(Get-Item "Cert:LocalMachine\my$certificateThumbprint" -ErrorAction SilentlyContinue)) {
Write-Host "Import Certificate to LocalMachinemy"
$certPfxSecurePassword = ConvertTo-SecureString -String $certPfxPassword -AsPlainText -Force
Import-PfxCertificate -FilePath $certPfxFile -CertStoreLocation "cert:localmachine\my" -Password $certPfxSecurePassword | Out-Null
}
Narrow it to this line:
$cert = New-Object System.Security.Cryptography.X509Certificates.X509Certificate2 -ArgumentList 'certpath.pfx','pwd'
If you enter the container - do you have internet connection?
you should try to force process isolation when creating the container.
Internet connection is working inside the container (using transparent lan).
Process isolation - no change.
Is it related to gMSA? or SSL - have you tried with these?
Both must be used - gMSA AND SSL. If gMSA is not used, all is OK. If no SSL is used, all is ok.
Because the issue is the line
$cert = New-Object System.Security.Cryptography.X509Certificates.X509Certificate2 -ArgumentList 'C:\run\my\my.pfx','mypass'
and only in combination with gMSA, it seems like some windows issue. But could someone confirm/reproduce this?
Windows server 2019, v1809 (build 17763.1790) Docker 19.03.14 (was trying first with some older 19.03.xx, same issue).
In the eventlog from the container I can see only one error:
Faulting application name: lsass.exe, version: 10.0.17763.1, time stamp: 0xf1beaffa
Faulting module name: netlogon.DLL, version: 10.0.17763.1757, time stamp: 0x5a274a51
Exception code: 0xc0000005
Fault offset: 0x0000000000002690
Faulting process id: 0x4fc04
Faulting application start time: 0x01d70a84858c875d
Faulting application path: C:\Windows\system32\lsass.exe
Faulting module path: C:\Windows\system32\netlogon.DLL
Report Id: dac08119-0595-4a57-8881-f44d1602eb82
Faulting package full name:
Faulting package-relative application ID:
Which is 3 seconds after the BC service was stopped, before MSSQL$SQLEXPRESS is started.
@tfenster do you have any idea? I am not using gMSA, I have never seen this
When gMSA is not used, in same "spot" in the event log is this: (but the certificate works, everything is ok).
Updated script including creating the gMSA:
$containerName = 'SSLGMSATEST'
$password = 'P@ssw0rd'
$securePassword = ConvertTo-SecureString -String $password -AsPlainText -Force
$credential = New-Object pscredential 'admin', $securePassword
$auth = 'NavUserPassword'
$artifactUrl = Get-BcArtifactUrl -type 'OnPrem' -country 'w1' -select 'Latest' -version 17.3.20469.20605
$domain = 'mydomain.local' #Enter your domain name to use
$allowedGroup = 'gMSAHosts' #existing security group in the domain, where the docker host is member of.
$Path = 'C:\ProgramData\Docker\credentialspecs'
$JsonPath = Join-Path $Path "$($containerName)_CredSpec.json"
if (-not (Test-Path $JsonPath)) {
Write-Host "Creating gMSA"
New-ADServiceAccount -Name "$containerName" -DnsHostName "$($containerName).$($domain)" -ServicePrincipalNames "host/$containerName", "host/$($containerName).$($domain)" -PrincipalsAllowedToRetrieveManagedPassword "$AllowedGroup"
Write-Host "Generate CredentialSpec file"
New-CredentialSpec -AccountName $containerName -Path $JsonPath -Domain $Domain -NoClobber
}
$Params = @("--security-opt credentialspec=file://$($containerName)_CredSpec.json")
New-BcContainer `
-accept_eula `
-containerName $containerName `
-credential $credential `
-auth $auth `
-artifactUrl $artifactUrl `
-multitenant:$false `
-additionalParameters $Params `
-useSSL `
-isolation process `
-updateHosts
Need to have RSAT ActiveDirectory feature installed e.g. by>
Add-WindowsFeature -Name "RSAT-AD-PowerShell"
Hope that it is enough to reproduce...
Just tried "same" (gMSA and using the $cert = New-Object System.Security.Cryptography.X509Certificates.X509Certificate2 -ArgumentList 'C:\run\my\my.pfx','mypass') on 'mcr.microsoft.com/windows/servercore:ltsc2019' image and it worked. Seems that it must be something in the BC generic image or in the installation. How can I prevent the automatic BC installation inside the container?
Running same on clean Windows Server 2004 (Host OS Version 10.0.19041.804 (2004)) and having same issue.
Sorry, but my problem is that I have never setup gMSA, I am not a domain admin on Microsofts AD and there is little chance that I will ever be. This means that I have no chance of supporting gMSA questions or topics. I know that some people are using this and I guess the reason why they don't run into the same issue is that gMSA is (as I understand it) for local networks and there is less reason to use SSL on local networks. In general I don't understand why people would use gMSA.
I was kind of hoping that some of the people using gMSA could chime in here - I am afraid that throwing my efforts on this would cost me days:-(
Just why gMSA - to use domain accounts to authenticate the developers and consultants accessing the container... (this is main reason now, using sandbox containers will change that later). To test this you will need to create test infrastructure (some test domain server).
I understand that this is really time consuming.
What I have found just now:
Image mcr.microsoft.com/businesscentral:10.0.19041.508 works Image mcr.microsoft.com/businesscentral:10.0.19041.804 fails
The difference is the KB4601319 (for Windows server 2019 it is I think KB4601345). After I installed to my Windows Server 2004, the containers with gSMA and SSL started to fail during creation. May be this is why not others hit that because the KB is from last month and may be many servers are not updated yet to it.
Next step> confirm that it is docker vs windows problem and not BC generic image problem...
Tried to build my own generic image and found the layer where it starts to fail.
This i history of my image:
IMAGE CREATED CREATED BY SIZE COMMENT
1c1615493af1 9 minutes ago powershell -Command $ErrorActionPreference =… 41kB
4b4a0a70991b 9 minutes ago powershell -Command $ErrorActionPreference =… 41kB
123c63f9230e 9 minutes ago powershell -Command $ErrorActionPreference =… 41kB
225f67e3da99 9 minutes ago powershell -Command $ErrorActionPreference =… 41kB
5c83df6fed69 9 minutes ago |3 created=202103091828 osversion=10.0.19041… 750MB
cb89d2ac85b3 12 minutes ago powershell -Command $ErrorActionPreference =… 571kB
754d0f31d15b 12 minutes ago |3 created=202103091828 osversion=10.0.19041… 1.55GB
ad63025f5638 22 minutes ago powershell -Command $ErrorActionPreference =… 41kB
15c43b363c80 22 minutes ago cmd /S /C #(nop) ARG osversion 41kB
c82ffdf57f7a 22 minutes ago cmd /S /C #(nop) ARG tag 41kB
3a41bfd8a24a 22 minutes ago cmd /S /C #(nop) ARG created 41kB
c201d591fded 4 weeks ago cmd /S /C curl -fSLo patch.msu http://downlo… 1.03GB
<missing> 4 weeks ago cmd /S /C #(nop) ENV DOTNET_RUNNING_IN_CONT… 41kB
<missing> 4 weeks ago Install update 2004-amd64 1.94GB
<missing> 15 months ago Apply image 2004-RTM-amd64 2.65GB
Image 754d0f31d15b is failing, image ad63025f5638 is working. Will dig more...
754d is the big RUN statement. You can split that up into multiple run statements in the DOCKERFILE to get individual layers for every line.
Sometime the build itself fails with error>
returned a non-zero code: 4294967295: failed to shutdown container: container 834dde9e8842f3544068f810e6e94ace19c80b73f6d78e57852fdcc2293fbae9 encountered an error during hcsshim::System::Shutdown: failure in a Windows system call: The remote procedure call failed and did not execute. (0x6bf): subsequent terminate failed container 834dde9e8842f3544068f810e6e94ace19c80b73f6d78e57852fdcc2293fbae9 encountered an error during hcsshim::System::Terminate: failure in a Windows system call: The remote procedure call failed and did not execute. (0x6bf)
I got similar when I tried to do the steps manually inside the container I get this error when trying to download the SQL installation:
Ok, it starts to fail when SQL is installed. This makes it to fail:
RUN .\setup\setup.exe /q /ACTION=Install /INSTANCENAME=SQLEXPRESS /FEATURES=SQLEngine /UPDATEENABLED=0 /SQLSVCACCOUNT='NT AUTHORITY\System' /SQLSYSADMINACCOUNTS='BUILTIN\ADMINISTRATORS' /TCPENABLED=1 /NPENABLED=0 /IACCEPTSQLSERVERLICENSETERMS ; \
While (!(get-service 'MSSQL$SQLEXPRESS' -ErrorAction SilentlyContinue)) { Start-Sleep -Seconds 5 } ; \
Stop-Service 'MSSQL$SQLEXPRESS' ;
Trying to do same manually inside the running container...
Running same in the container finished correctly. Cert working. But I didn't do any "restart" of the container after the installation.
Today Windows cumulative update 2021-03 is ready to be installed. Will try after the installation...
I will create new generic images for the new cu tomorrow (if dotnet framework has shipped)
Ok, combination:
Container OS Version: 10.0.19041.804 (2004) Host OS Version: 10.0.19041.867 (2004)
no change - failing.
Trying to generate the image with the new .net base image and it is failing on the SQL install step:
---> 86bcedcd2d22
Step 11/18 : RUN .\setup\setup.exe /q /ACTION=Install /INSTANCENAME=SQLEXPRESS /FEATURES=SQLEngine /UPDATEENABLED=0 /SQLSVCACCOUNT='NT AUTHORITY\System' /SQLSYSADMINACCOUNTS='BUILTIN\ADMINISTRATORS' /TCPENABLED=1 /NPENABLED=0 /IACCEPTSQLSERVERLICENSETERMS ; While (!(get-service 'MSSQL$SQLEXPRESS' -ErrorAction SilentlyContinue)) { Start-Sleep -Seconds 5 } ; Stop-Service 'MSSQL$SQLEXPRESS' ;
---> Running in 5762a17510a2
SQL Server 2019 transmits information about your installation experience, as well as other usage and performance data, to Microsoft to help improve the product. To learn more about SQL Server 2019 data processing and privacy controls, please see the Privacy Statement.
The command 'powershell -Command $ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue'; .\setup\setup.exe /q /ACTION=Install /INSTANCENAME=SQLEXPRESS /FEATURES=SQLEngine /UPDATEENABLED=0 /SQLSVCACCOUNT='NT AUTHORITY\System' /SQLSYSADMINACCOUNTS='BUILTIN\ADMINISTRATORS' /TCPENABLED=1 /NPENABLED=0 /IACCEPTSQLSERVERLICENSETERMS ; While (!(get-service 'MSSQL$SQLEXPRESS' -ErrorAction SilentlyContinue)) { Start-Sleep -Seconds 5 } ; Stop-Service 'MSSQL$SQLEXPRESS' ;' returned a non-zero code: 3221226505
C:\tools\bcdocker\generic\build.ps1 : Failed with exit code -1073740791
At line:1 char:1
+ .\build.ps1
+ ~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [Write-Error], WriteErrorException
+ FullyQualifiedErrorId : Microsoft.PowerShell.Commands.WriteErrorException,build.ps1
My build machine was able to build the new servercore CU images without problems.
Ok, seems that this was problem on my side (running container with the image).
Now I builded my own updated image and still failing after SQL 2019 Express is installed...
You could try to build the SQL2017 generic image (not quite up-to-date) from the SQL2017 branch
Installed KB4589212 (2021-01 Update for Windows Server, version 2004)on Windows Server 2004. Interesting is, that this update appeared today, after the 2021-03 was installed. Still not working (once it looks like it works, but failed somewhere else, other tries failed in same way as before)
Not solved still on Windows Server 2019 even with KB4589208 (2021-01 Update for Windows Server 2019). I am missing update 2021-03 on it yet, seems that I need even this to solve it (KB5000822)... installing...
Strange - but happy that you have a solution - and thanks for mentioning the KB's here.
No, I do not have solution (I have edited the text later). It didn't solved the problem. Have tried the SQL2017 - no change.
Any updates regarding this, We are running into the same problem
I think @kine had to give up
I will close this issue and follow the progress in the above issue on windows containers.
If you could do a bit more than follow by pulling some Microsoft strings - that would be awesome ;-).
Adding a possible workaround here, since this is the only place I've seen a post regarding this issue.
I ran into the same issue building a Windows container using gMSA and SQL Server. I isolated this further by building the container with the SQL service disabled and the container + AD authentication work as expected (albeit without SQL). However as soon as the SQL service is enabled and started, it throws the master key and certificate errors, lsass.exe crashes and the container halts.
My SQL container is built using the NT Service\MSSQL$xxx virtual account for the SQL service. I found that if I use LocalSystem for the SQL Server service startup account instead, everything works fine. I tried the other built-in accounts like LocalService, NetworkService, etc but they all have the same issue. The SQL Agent runs fine with the default virtual account.
I'm not sure if this is simply a permission issue or something more complex related to DPAPI, machine key, SQL's ability/inability to decrypt various things, etc. I have a case open with MS (no findings yet), will update here if anything comes of it.
Today I am hitting this problem when creating BC container on our Windows Server 2019. What changed since it worked: the server windows were updated during weekend (infrastructure shutdown because power line reconnection) and all servers were restarted. I suppose that the update could be a reason.
Describe the issue Seems that it depends on 2 things> SSL is used gMSA is used. I am usingthis script (simulated, real is different, but this is giving same error)
Full output of scripts
I have looked into the scripts and it fails when calling SetupCertificate.ps1 script. Trying to run it manually gives:
When I call same script (SetupCertificate) in different container, created before upgrade, it finishes ok (used image for this ok container is mcr.microsoft.com/businesscentral:10.0.17763.1397, the new one where it fails is mcr.microsoft.com/businesscentral:10.0.17763.1757 - do not know if it is relevant or not).