microsoft / navcontainerhelper

Official Microsoft repository for BcContainerHelper, a PowerShell module, which makes it easier to work with Business Central Containers on Docker.
MIT License
382 stars 246 forks source link

"The remote procedure call failed and did not execute." when useSSL and gMSA used #1730

Closed kine closed 3 years ago

kine commented 3 years ago

Today I am hitting this problem when creating BC container on our Windows Server 2019. What changed since it worked: the server windows were updated during weekend (infrastructure shutdown because power line reconnection) and all servers were restarted. I suppose that the update could be a reason.

Describe the issue Seems that it depends on 2 things> SSL is used gMSA is used. I am usingthis script (simulated, real is different, but this is giving same error)

$containerName = 'WI81576'
$password = 'P@ssw0rd'
$securePassword = ConvertTo-SecureString -String $password -AsPlainText -Force
$credential = New-Object pscredential 'admin', $securePassword
$auth = 'Windows'
$artifactUrl = Get-BcArtifactUrl -type 'OnPrem' -country 'w1' -select 'Latest' -version 17.3.20469.20605

$Params = @("--security-opt credentialspec=file://WI81576_CredSpec.json")

New-BcContainer `
    -accept_eula `
    -containerName $containerName `
    -credential $credential `
    -auth $auth `
    -artifactUrl $artifactUrl `
    -multitenant:$false `
    -additionalParameters $Params `
    -useSSL `
    -updateHosts

Full output of scripts

BcContainerHelper is version 2.0.6-preview351
BcContainerHelper is running as administrator
Host is Microsoft Windows Server 2019 Standard - ltsc2019
Docker Client Version is 19.03.14
Docker Server Version is 19.03.14
Removing container WI81576
Removing WI81576 from host hosts file
Removing WI81576-* from host hosts file
Removing C:\ProgramData\NavContainerHelper\Extensions\WI81576
Fetching all docker images
Using image mcr.microsoft.com/businesscentral:10.0.17763.1757
Creating Container WI81576
Version: 17.3.20469.20605-w1
Style: onprem
Multitenant: No
Platform: 17.0.20458.20517
Generic Tag: 1.0.1.3
Container OS Version: 10.0.17763.1757 (ltsc2019)
Host OS Version: 10.0.17763.1790 (ltsc2019)
Using hyperv isolation
Using locale en-US
Disabling the standard eventlog dump to container log every 2 seconds (use -dumpEventLog to enable)
Additional Parameters:
--security-opt credentialspec=file://WI81576_CredSpec.json
Files in C:\ProgramData\NavContainerHelper\Extensions\WI81576\my:
- AdditionalOutput.ps1
- MainLoop.ps1
- SetupVariables.ps1
- updatehosts.ps1
Creating container WI81576 from image mcr.microsoft.com/businesscentral:10.0.17763.1757
49795cbac00d9956f0949dc4995dc75eb55c080f6d557f00834a6729d9c79d9e
Waiting for container WI81576 to be ready
Using artifactUrl https://bcartifacts.azureedge.net/onprem/17.3.20469.20605/w1
Using installer from C:\Run\150-new
Installing Business Central
Installing from artifacts
Starting Local SQL Server
WARNING: Waiting for service 'SQL Server (SQLEXPRESS) (MSSQL$SQLEXPRESS)' to 
start...
Starting Internet Information Server
Copying Service Tier Files
Copying PowerShell Scripts
Copying dependencies
Copying ReportBuilder
Importing PowerShell Modules
Determining Database Collation from c:\dl\onprem\17.3.20469.20605\w1\database\Demo Database NAV (17-0).bak
Restoring CRONUS Demo Database
Setting CompatibilityLevel for CRONUS on localhost\SQLEXPRESS
Modifying Business Central Service Tier Config File for Docker
Creating Business Central Service Tier
Installing SIP crypto provider: 'C:\Windows\System32\NavSip.dll'
Copying Web Client Files
Copying Client Files
Copying ModernDev Files
Copying additional files
Copying ConfigurationPackages
Copying Test Assemblies
Copying Applications
Starting Business Central Service Tier
Importing license file
Stopping Business Central Service Tier
Installation took 175 seconds
Installation complete
Initializing...
Setting host.containerhelper.internal to 172.18.128.1 in container hosts file
Starting Container
Hostname is WI81576
PublicDnsName is WI81576
Using Windows Authentication
Creating Self Signed Certificatedocker : Write-Host : The remote procedure call failed and did not execute.
At C:\Program Files\WindowsPowerShell\Modules\BcContainerHelper\2.0.6\ContainerHandling\Wait-NavContainerReady.ps1:31 char:21
+             $logs = docker logs $containerName
+                     ~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (Write-Host : Th...id not execute.:String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError

I have looked into the scripts and it fails when calling SetupCertificate.ps1 script. Trying to run it manually gives:

[WI81576] PS C:\Run> .\SetupCertificate.ps1
Creating Self Signed Certificate
CertEnroll::CX509PrivateKey::Create: The remote procedure call failed. 0x800706be (WIN32: 1726 RPC_S_CALL_FAILED)
    + CategoryInfo          : OperationStopped: (:) [New-SelfSignedCertificateEx], COMException
    + FullyQualifiedErrorId : System.Runtime.InteropServices.COMException,New-SelfSignedCertificateEx

When I call same script (SetupCertificate) in different container, created before upgrade, it finishes ok (used image for this ok container is mcr.microsoft.com/businesscentral:10.0.17763.1397, the new one where it fails is mcr.microsoft.com/businesscentral:10.0.17763.1757 - do not know if it is relevant or not).

kine commented 3 years ago

Additionally - it fails even when I am using our own certificate using this script SetupCertificate.ps1>

$certPfxFile = Join-Path $PSScriptRoot (Split-Path $env:CertFile -Leaf)
$certPfxPassword = $env:CertPwd
$dnsidentity = $env:DnsIdentity
$publicDnsname = "${hostname}.${dnsidentity}"

$cert = New-Object System.Security.Cryptography.X509Certificates.X509Certificate2 -ArgumentList $certPfxFile,$certPfxPassword
$certificateThumbprint = $cert.Thumbprint

Write-Host "Certificate File Thumbprint $certificateThumbprint"
if (!(Get-Item "Cert:LocalMachine\my$certificateThumbprint" -ErrorAction SilentlyContinue)) {
    Write-Host "Import Certificate to LocalMachinemy"
    $certPfxSecurePassword = ConvertTo-SecureString -String $certPfxPassword -AsPlainText -Force
    Import-PfxCertificate -FilePath $certPfxFile -CertStoreLocation "cert:localmachine\my" -Password $certPfxSecurePassword | Out-Null
}
kine commented 3 years ago

Narrow it to this line:

$cert = New-Object System.Security.Cryptography.X509Certificates.X509Certificate2 -ArgumentList 'certpath.pfx','pwd'
freddydk commented 3 years ago

If you enter the container - do you have internet connection?

freddydk commented 3 years ago

you should try to force process isolation when creating the container.

kine commented 3 years ago

Internet connection is working inside the container (using transparent lan).

kine commented 3 years ago

Process isolation - no change.

freddydk commented 3 years ago

Is it related to gMSA? or SSL - have you tried with these?

kine commented 3 years ago

Both must be used - gMSA AND SSL. If gMSA is not used, all is OK. If no SSL is used, all is ok.

kine commented 3 years ago

Because the issue is the line

$cert = New-Object System.Security.Cryptography.X509Certificates.X509Certificate2 -ArgumentList 'C:\run\my\my.pfx','mypass'

and only in combination with gMSA, it seems like some windows issue. But could someone confirm/reproduce this?

Windows server 2019, v1809 (build 17763.1790) Docker 19.03.14 (was trying first with some older 19.03.xx, same issue).

kine commented 3 years ago

In the eventlog from the container I can see only one error:

Faulting application name: lsass.exe, version: 10.0.17763.1, time stamp: 0xf1beaffa
Faulting module name: netlogon.DLL, version: 10.0.17763.1757, time stamp: 0x5a274a51
Exception code: 0xc0000005
Fault offset: 0x0000000000002690
Faulting process id: 0x4fc04
Faulting application start time: 0x01d70a84858c875d
Faulting application path: C:\Windows\system32\lsass.exe
Faulting module path: C:\Windows\system32\netlogon.DLL
Report Id: dac08119-0595-4a57-8881-f44d1602eb82
Faulting package full name: 
Faulting package-relative application ID: 

Which is 3 seconds after the BC service was stopped, before MSSQL$SQLEXPRESS is started. image

freddydk commented 3 years ago

@tfenster do you have any idea? I am not using gMSA, I have never seen this

kine commented 3 years ago

When gMSA is not used, in same "spot" in the event log is this: image (but the certificate works, everything is ok).

kine commented 3 years ago

Updated script including creating the gMSA:

$containerName = 'SSLGMSATEST'
$password = 'P@ssw0rd'
$securePassword = ConvertTo-SecureString -String $password -AsPlainText -Force
$credential = New-Object pscredential 'admin', $securePassword
$auth = 'NavUserPassword'
$artifactUrl = Get-BcArtifactUrl -type 'OnPrem' -country 'w1' -select 'Latest' -version 17.3.20469.20605
$domain = 'mydomain.local'  #Enter your domain name to use
$allowedGroup = 'gMSAHosts' #existing security group in the domain, where the docker host is member of.    
$Path =  'C:\ProgramData\Docker\credentialspecs'
$JsonPath = Join-Path $Path "$($containerName)_CredSpec.json"

if (-not (Test-Path $JsonPath)) {
    Write-Host "Creating gMSA"
    New-ADServiceAccount -Name "$containerName" -DnsHostName "$($containerName).$($domain)" -ServicePrincipalNames "host/$containerName", "host/$($containerName).$($domain)" -PrincipalsAllowedToRetrieveManagedPassword "$AllowedGroup"

    Write-Host "Generate CredentialSpec file"
    New-CredentialSpec -AccountName $containerName -Path $JsonPath -Domain $Domain -NoClobber
}

$Params = @("--security-opt credentialspec=file://$($containerName)_CredSpec.json")

New-BcContainer `
    -accept_eula `
    -containerName $containerName `
    -credential $credential `
    -auth $auth `
    -artifactUrl $artifactUrl `
    -multitenant:$false `
    -additionalParameters $Params `
    -useSSL `
    -isolation process `
    -updateHosts

Need to have RSAT ActiveDirectory feature installed e.g. by>

Add-WindowsFeature -Name "RSAT-AD-PowerShell"

Hope that it is enough to reproduce...

kine commented 3 years ago

Just tried "same" (gMSA and using the $cert = New-Object System.Security.Cryptography.X509Certificates.X509Certificate2 -ArgumentList 'C:\run\my\my.pfx','mypass') on 'mcr.microsoft.com/windows/servercore:ltsc2019' image and it worked. Seems that it must be something in the BC generic image or in the installation. How can I prevent the automatic BC installation inside the container?

kine commented 3 years ago

Running same on clean Windows Server 2004 (Host OS Version 10.0.19041.804 (2004)) and having same issue.

freddydk commented 3 years ago

Sorry, but my problem is that I have never setup gMSA, I am not a domain admin on Microsofts AD and there is little chance that I will ever be. This means that I have no chance of supporting gMSA questions or topics. I know that some people are using this and I guess the reason why they don't run into the same issue is that gMSA is (as I understand it) for local networks and there is less reason to use SSL on local networks. In general I don't understand why people would use gMSA.

I was kind of hoping that some of the people using gMSA could chime in here - I am afraid that throwing my efforts on this would cost me days:-(

kine commented 3 years ago

Just why gMSA - to use domain accounts to authenticate the developers and consultants accessing the container... (this is main reason now, using sandbox containers will change that later). To test this you will need to create test infrastructure (some test domain server).

I understand that this is really time consuming.

What I have found just now:

Image mcr.microsoft.com/businesscentral:10.0.19041.508 works Image mcr.microsoft.com/businesscentral:10.0.19041.804 fails

The difference is the KB4601319 (for Windows server 2019 it is I think KB4601345). After I installed to my Windows Server 2004, the containers with gSMA and SSL started to fail during creation. May be this is why not others hit that because the KB is from last month and may be many servers are not updated yet to it.

kine commented 3 years ago

Next step> confirm that it is docker vs windows problem and not BC generic image problem...

kine commented 3 years ago

Tried to build my own generic image and found the layer where it starts to fail.

This i history of my image:

IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
1c1615493af1        9 minutes ago       powershell -Command $ErrorActionPreference =…   41kB
4b4a0a70991b        9 minutes ago       powershell -Command $ErrorActionPreference =…   41kB
123c63f9230e        9 minutes ago       powershell -Command $ErrorActionPreference =…   41kB
225f67e3da99        9 minutes ago       powershell -Command $ErrorActionPreference =…   41kB
5c83df6fed69        9 minutes ago       |3 created=202103091828 osversion=10.0.19041…   750MB
cb89d2ac85b3        12 minutes ago      powershell -Command $ErrorActionPreference =…   571kB
754d0f31d15b        12 minutes ago      |3 created=202103091828 osversion=10.0.19041…   1.55GB
ad63025f5638        22 minutes ago      powershell -Command $ErrorActionPreference =…   41kB
15c43b363c80        22 minutes ago      cmd /S /C #(nop)  ARG osversion                 41kB
c82ffdf57f7a        22 minutes ago      cmd /S /C #(nop)  ARG tag                       41kB
3a41bfd8a24a        22 minutes ago      cmd /S /C #(nop)  ARG created                   41kB
c201d591fded        4 weeks ago         cmd /S /C curl -fSLo patch.msu http://downlo…   1.03GB
<missing>           4 weeks ago         cmd /S /C #(nop)  ENV DOTNET_RUNNING_IN_CONT…   41kB
<missing>           4 weeks ago         Install update 2004-amd64                       1.94GB
<missing>           15 months ago       Apply image 2004-RTM-amd64                      2.65GB

Image 754d0f31d15b is failing, image ad63025f5638 is working. Will dig more...

freddydk commented 3 years ago

754d is the big RUN statement. You can split that up into multiple run statements in the DOCKERFILE to get individual layers for every line.

kine commented 3 years ago

Sometime the build itself fails with error>

returned a non-zero code: 4294967295: failed to shutdown container: container 834dde9e8842f3544068f810e6e94ace19c80b73f6d78e57852fdcc2293fbae9 encountered an error during hcsshim::System::Shutdown: failure in a Windows system call: The remote procedure call failed and did not execute. (0x6bf): subsequent terminate failed container 834dde9e8842f3544068f810e6e94ace19c80b73f6d78e57852fdcc2293fbae9 encountered an error during hcsshim::System::Terminate: failure in a Windows system call: The remote procedure call failed and did not execute. (0x6bf)

I got similar when I tried to do the steps manually inside the container I get this error when trying to download the SQL installation: image

kine commented 3 years ago

Ok, it starts to fail when SQL is installed. This makes it to fail:

RUN .\setup\setup.exe /q /ACTION=Install /INSTANCENAME=SQLEXPRESS /FEATURES=SQLEngine /UPDATEENABLED=0 /SQLSVCACCOUNT='NT AUTHORITY\System' /SQLSYSADMINACCOUNTS='BUILTIN\ADMINISTRATORS' /TCPENABLED=1 /NPENABLED=0 /IACCEPTSQLSERVERLICENSETERMS ; \
    While (!(get-service 'MSSQL$SQLEXPRESS' -ErrorAction SilentlyContinue)) { Start-Sleep -Seconds 5 } ; \
    Stop-Service 'MSSQL$SQLEXPRESS' ; 

Trying to do same manually inside the running container...

kine commented 3 years ago

Running same in the container finished correctly. Cert working. But I didn't do any "restart" of the container after the installation.

kine commented 3 years ago

Today Windows cumulative update 2021-03 is ready to be installed. Will try after the installation...

freddydk commented 3 years ago

I will create new generic images for the new cu tomorrow (if dotnet framework has shipped)

kine commented 3 years ago

Ok, combination:

Container OS Version: 10.0.19041.804 (2004) Host OS Version: 10.0.19041.867 (2004)

no change - failing.

kine commented 3 years ago

Trying to generate the image with the new .net base image and it is failing on the SQL install step:

 ---> 86bcedcd2d22
Step 11/18 : RUN .\setup\setup.exe /q /ACTION=Install /INSTANCENAME=SQLEXPRESS /FEATURES=SQLEngine /UPDATEENABLED=0 /SQLSVCACCOUNT='NT AUTHORITY\System' /SQLSYSADMINACCOUNTS='BUILTIN\ADMINISTRATORS' /TCPENABLED=1 /NPENABLED=0 /IACCEPTSQLSERVERLICENSETERMS ;     While (!(get-service 'MSSQL$SQLEXPRESS' -ErrorAction SilentlyContinue)) { Start-Sleep -Seconds 5 } ;     Stop-Service 'MSSQL$SQLEXPRESS' ;
 ---> Running in 5762a17510a2
SQL Server 2019 transmits information about your installation experience, as well as other usage and performance data, to Microsoft to help improve the product. To learn more about SQL Server 2019 data processing and privacy controls, please see the Privacy Statement.
The command 'powershell -Command $ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue'; .\setup\setup.exe /q /ACTION=Install /INSTANCENAME=SQLEXPRESS /FEATURES=SQLEngine /UPDATEENABLED=0 /SQLSVCACCOUNT='NT AUTHORITY\System' /SQLSYSADMINACCOUNTS='BUILTIN\ADMINISTRATORS' /TCPENABLED=1 /NPENABLED=0 /IACCEPTSQLSERVERLICENSETERMS ;     While (!(get-service 'MSSQL$SQLEXPRESS' -ErrorAction SilentlyContinue)) { Start-Sleep -Seconds 5 } ;     Stop-Service 'MSSQL$SQLEXPRESS' ;' returned a non-zero code: 3221226505
C:\tools\bcdocker\generic\build.ps1 : Failed with exit code -1073740791
At line:1 char:1
+ .\build.ps1
+ ~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [Write-Error], WriteErrorException
    + FullyQualifiedErrorId : Microsoft.PowerShell.Commands.WriteErrorException,build.ps1
freddydk commented 3 years ago

My build machine was able to build the new servercore CU images without problems.

kine commented 3 years ago

Ok, seems that this was problem on my side (running container with the image).

Now I builded my own updated image and still failing after SQL 2019 Express is installed...

freddydk commented 3 years ago

You could try to build the SQL2017 generic image (not quite up-to-date) from the SQL2017 branch

kine commented 3 years ago

Installed KB4589212 (2021-01 Update for Windows Server, version 2004)on Windows Server 2004. Interesting is, that this update appeared today, after the 2021-03 was installed. Still not working (once it looks like it works, but failed somewhere else, other tries failed in same way as before)

Not solved still on Windows Server 2019 even with KB4589208 (2021-01 Update for Windows Server 2019). I am missing update 2021-03 on it yet, seems that I need even this to solve it (KB5000822)... installing...

freddydk commented 3 years ago

Strange - but happy that you have a solution - and thanks for mentioning the KB's here.

kine commented 3 years ago

No, I do not have solution (I have edited the text later). It didn't solved the problem. Have tried the SQL2017 - no change.

joachimcarrein commented 3 years ago

Any updates regarding this, We are running into the same problem

freddydk commented 3 years ago

I think @kine had to give up

freddydk commented 3 years ago

I will close this issue and follow the progress in the above issue on windows containers.

waldo1001 commented 3 years ago

If you could do a bit more than follow by pulling some Microsoft strings - that would be awesome ;-).

jermicus commented 2 years ago

Adding a possible workaround here, since this is the only place I've seen a post regarding this issue.

I ran into the same issue building a Windows container using gMSA and SQL Server. I isolated this further by building the container with the SQL service disabled and the container + AD authentication work as expected (albeit without SQL). However as soon as the SQL service is enabled and started, it throws the master key and certificate errors, lsass.exe crashes and the container halts.

My SQL container is built using the NT Service\MSSQL$xxx virtual account for the SQL service. I found that if I use LocalSystem for the SQL Server service startup account instead, everything works fine. I tried the other built-in accounts like LocalService, NetworkService, etc but they all have the same issue. The SQL Agent runs fine with the default virtual account.

I'm not sure if this is simply a permission issue or something more complex related to DPAPI, machine key, SQL's ability/inability to decrypt various things, etc. I have a case open with MS (no findings yet), will update here if anything comes of it.