mattmcspirit / azurestack

Azure Stack Resources
80 stars 41 forks source link

MSSQL deployment fails #49

Closed gijs007 closed 5 years ago

gijs007 commented 5 years ago

I'm experiencing issues with the deployment script when deploying MSSQL.

There are two different errors/bugs. The first time the deployment script runs the error below is shown, this happens during the MSSQL step: SQLServerRP. The error is:

The command "trace-step" is not recognized.

Executing the deployment script again gives a different error:

File Export-InstalledProducts.ps1 can not be found.

After investigating the second error, it appears that this file (Export-InstalledProducts.ps1) is downloaded as part of the SQL.zip file. The SQL.zip file is extracted, however the extracted content is being deleted automatically shortly after extracting the content. Hence why the Export-InstalledProducts.ps1 file is missing.

Environment details: Used ASDK build 1.1808.0.97 Used parameters are: -skipMySQL and -skipCustomizeHost Mode used: ADFS

mattmcspirit commented 5 years ago

Hi, Thanks for the message. This is strange - i'll need to test this in the coming days, however, looking quickly at the code:

if (!$([System.IO.Directory]::Exists("$ASDKpath\databases"))) {
                New-Item -Path "$ASDKpath\databases" -ItemType Directory -Force | Out-Null
            }

            if ($deploymentMode -eq "Online") {
                # Cleanup old folder
                Remove-Item "$asdkPath\databases\SQL" -Recurse -Force -Confirm:$false -ErrorAction SilentlyContinue
                # Download and Expand the SQL Server RP files
                $sqlRpURI = "https://aka.ms/azurestacksqlrp"
                $sqlRpDownloadLocation = "$ASDKpath\databases\SQL.zip"
                DownloadWithRetry -downloadURI "$sqlRpURI" -downloadLocation "$sqlRpDownloadLocation" -retries 10
            }
            elseif ($deploymentMode -ne "Online") {
                if (-not [System.IO.File]::Exists("$ASDKpath\databases\SQL.zip")) {
                    throw "Missing SQL Server Zip file in extracted dependencies folder. Please ensure this exists at $ASDKpath\databases\SQL.zip - Exiting process"
                }
            }

            Set-Location "$ASDKpath\databases\"
            Expand-Archive "$ASDKpath\databases\SQL.zip" -DestinationPath .\SQL -Force -ErrorAction Stop

As you can see, the script firstly checks if the databases folder exists in your (for example) D:\ASDKfiles\ASDK folder, and if it doesn't exist, it creates it.

Then, it checks if the SQL folder exists (from a previous run) and cleans it up if it does, then re-downloads the SQL RP ZIP file and extracts.

I don't see why it would try to clean up after downloading - that doesn't make sense, so unless something is not being cleaned correctly during the cleanup step, and this is causing issues when re-expanding the new ZIP file, I'm really not sure what's happening.

I'll investigate further for you, as soon as I can.

Thanks, Matt

ghost commented 5 years ago

Hey,

I have also run into this problem when you do a rerun after it fails with MSSQL. In my case the problem was that one of the dll files that was extracted was locked by the PowerShell console. When the zip file is extracting it cannot replace the file and does a rollback.

After closing the PowerShell console and manually delete the files in the asdk folder it continued.

So retry after logging off or closing all PowerShell windows.

Eelco

mattmcspirit commented 5 years ago
      Hey,

I have also run into this problem when you do a rerun after it fails with MSSQL. In my case the problem was that one of the dll files that was extracted was locked by the PowerShell console. When the zip file is extracting it cannot replace the file and does a rollback. After closing the PowerShell console and manually delete the files in the asdk folder it continued. So retry after logging off or closing all PowerShell windows. Eelco

Thanks for this - so the interesting thing here is, the installation of the DB RPs is run in a new PSSession, so shouldn't be locked for deletion in the current one, but i guess it's not working as expected.

My solution for 1809 will hopefully solve all of these kind of issues.

Thanks! Matt

mattmcspirit commented 5 years ago

This should be fixed for the new 1809 build of my script, available soon. Stay tuned!

gijs007 commented 5 years ago

Just started a redeployment with the new 1809 script:

The script is still running, however it reports three failures: AddSQLServerRP AddSQLServerSku AddSQLHosting

At this moment I'm not sure if it's related to the the bug I reported or if it's a new bug. Any log files which I should check?

mattmcspirit commented 5 years ago

Did you run this on the same system that had previously failed, without redeployment of the ASDK?

gijs007 commented 5 years ago

No, I did a redeployment of the 1809 ASDK.

mattmcspirit commented 5 years ago

OK, just let the script run until the running jobs complete. It will then give you more info on the failures. The AddSQL...Sku and ...Hosting have failed because the RP failed - that's by design.

mattmcspirit commented 5 years ago

If you could provide your launch command (remove any passwords and sensitive info) that would be great too.

gijs007 commented 5 years ago

Will post an update once the script is completed.

As for the launch command: .\ConfigASDK.ps1 -azureDirectoryTenantName "masdemo.onmicrosoft.com" -authenticationType AzureAD -downloadPath "Z:" -ISOPath "C:\iso\Windows_Server_2016_Datacenter_EVAL_en-us_14393_refresh.ISO" -azureStackAdminPwd "" -VMpwd " " -azureAdUsername "" -azureAdPwd " " ` -registerASDK -useAzureCredsForRegistration -azureRegSubId "" -skipMySQL -skipCustomizeHost

Note 1: We uze a share (Z:) for our download directory. Note 2. We set PowerShell security policy to bypass, since the share is considered remote/unsafe content by PowerShell.

mattmcspirit commented 5 years ago

Thanks - i wonder if I've just got something messed up with the -SkipMySQL piece, that's causing SQLServer to list as failed. I don't think there will be a problem with the Z:.

In your current running jobs view, do the MySQL steps show as 'Skipped' correctly?

Thanks!

gijs007 commented 5 years ago

In the current process overview: MySQLGallaryitem completed MySQLRP skipped. MySQLDBVM skipped MySQLQuota skipped MySQLaddhosting skipped

In the jobs completed overview: AddMySQLAspkg is completed AddMySQLRP is completed AddMySQLSKU is completed DeployMySQLHosting is completed AddMySQLhosting is completed

mattmcspirit commented 5 years ago

Great - at least that bit is working :)

The AddMySQLRP and AddMSSQLRP share the same .ps1 file, so i suspect there's an issue in my code that breaks MSSQLRP when you use -SkipMySQL.

I'll need to redeploy my ASDK host to re-test, but in the meantime, if you can share the log file from the AddSQLServerRP folder (within C:\ConfigASDK\Logs), I can see if there is anything obvious there.

Thanks!

gijs007 commented 5 years ago

Sure. Can I send the log file trough email? (I'm worried the log might contain sensitive information, which I'd rather not publish on a public Github)

mattmcspirit commented 5 years ago

No worries - you don't need to send the whole file - just look for where it fails, and provide that info (minus any sensitive info) on this thread - if there are multiple errors, just provide them all.

gijs007 commented 5 years ago

I've just checked, the folder AddSQLServerRP doesn't exist in: C:\ConfigASDK\Logs\currentdate\

I do have one for SQLServerRP:

VERBOSE:` Created 'Z:\ASDK\databases\SQLServer\Templates\Update-GuestOS.json'.

Known issue with AzureRMProfile 2018-03-01-hybrid and Database RP installation  - editing Common.psm1

Editing file

VERBOSE: Performing the operation "Set Content" on target "Path: Z:\ASDK\databases\SQLServer\Prerequisites\Common\Common.psm1".

VERBOSE: Performing the operation "Clear Content" on target "Item: Z:\ASDK\databases\SQLServer\Prerequisites\Common\Common.psm1".

Editing completed.

Known issue with AzureRMProfile 2018-03-01-hybrid and Database RP installation  - editing Common.psm1

Editing file

VERBOSE: Performing the operation "Set Content" on target "Path: Z:\ASDK\databases\SQLServer\Prerequisites\Common\Common.psm1".

VERBOSE: Performing the operation "Clear Content" on target "Item: Z:\ASDK\databases\SQLServer\Prerequisites\Common\Common.psm1".

Editing completed.

Known issue with AzureRMProfile 2018-03-01-hybrid and Database RP installation  - editing Common.psm1

Editing file

VERBOSE: Performing the operation "Set Content" on target "Path: Z:\ASDK\databases\SQLServer\Prerequisites\Common\Common.psm1".

VERBOSE: Performing the operation "Clear Content" on target "Item: Z:\ASDK\databases\SQLServer\Prerequisites\Common\Common.psm1".

Editing completed.

VERBOSE: Loading module from path 'Z:\ASDK\databases\SQLServer\Prerequisites\Common\Common.psm1'.

>> TerminatingError(Import-Module): "Could not load file or assembly 'file:///Z:\ASDK\databases\SQLServer\Telemetry\Microsoft.AzureStack.Deploy.Telemetry.dll' or one of its dependencies. Operation is not supported. (Exception from HRESULT: 0x80131515)"

>> TerminatingError(Import-Module): "The running command stopped because the preference variable "ErrorActionPreference" or common parameter is set to Stop: Could not load file or assembly 'file:///Z:\ASDK\databases\SQLServer\Telemetry\Microsoft.AzureStack.Deploy.Telemetry.dll' or one of its dependencies. Operation is not supported. (Exception from HRESULT: 0x80131515)"

**********************

Command start time: 20181107130733

**********************

PS>TerminatingError(DeploySQLProvider.ps1): "The term 'Trace-Step' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again."

ASDK Configurator Stage: SQLServerRP failed. Updating ConfigASDK Progress database

The term 'Trace-Step' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try `again..Exception.Message`
mattmcspirit commented 5 years ago

Ah yeah, sorry about that, my job names and log names don't match exactly - something I need to fix for a future release.

So, looking at the info you've supplied, this is strange.

It looks like, as it stands, the SQL files are downloaded correctly, and expanded to \databases\SQLServer\, and then, my edits to the Common.psm1 file (which makes the install process work with PowerShell 1.5.0 and the new AzureRmProfile) are succesful, which also tells me that the script can access Z:\ without issue, and has enough privilege to edit files located on the Z:.

The next step, is to call the official DeploySQLProvider.ps1 (which is written by the SQL RP team, not me)

.\DeploySQLProvider.ps1 -AzCredential $asdkCreds -VMLocalCredential $vmLocalAdminCreds -CloudAdminCredential $cloudAdminCreds -PrivilegedEndpoint $ERCSip -DefaultSSLCertificatePassword $secureVMpwd

Now, what happens then, is DeploySQLProvider calls the Common.psm1, however right at the start of the Common.psm1 is the following:

Import-Module -Name "$PSScriptRoot\..\..\Telemetry\Microsoft.AzureStack.Deploy.Telemetry.dll" -ErrorAction Stop -Verbose:$false # Telemetry

This is where your first failure is. Add as a result, because this fails, the function Trace-Step (which is also found in Common.psm1) also fails (that's your second failure message there).

So, I did a bit of searching, and I stumbled upon this:

http://www.clearmindsoftware.com/post/Resolution-Error-Could-not-load-file-or-assembly-file5c5cserver5cpath5cfiledll-or-one-of-its-dependencies-(0x80131515)

The similarities between your case and this one are focused around the use of the mapped drive, but what i don't know is, if the issue is caused by the file being located on the mapped drive, or, if it's because the file, when downloaded from the internet, is considered 'unsafe' and thus, you should right click the file and click 'Unblock' and maybe then, it will work.

**Can you check in the folder for the telemetry.dll and right-click, properties, then see if the 'Unblock' option is listed?

gijs007 commented 5 years ago

Thank you for the detailed explanation :)

I've tried the script from a local USB stick, after I copied the files from the share. The MSSQL deployment completed fine now. However the appservice deployment failed.. But that might be because of leftovers from the previous attempt.

As for the telemetry.dll, the one located in ASDK\databases\SQLServer\Telemetry, it's digitally signed by Microsoft and doesn't have an unblock option listed.

mattmcspirit commented 5 years ago

Hi - so i deployed my new ASDK with the same command as you, with -skipMySQL and it completed fine, in 3 hours 18 mins. The only differences between our environments are the use of the Z:\ mapped drive, and the subsequent use of bypass for the execution policy, so I wonder if that is causing an issue.

[12:24 AM]::[CLEANUP]:: Congratulations - all steps completed successfully:

GetScripts : Complete CheckPowerShell : Complete InstallPowerShell : Complete DownloadTools : Complete HostConfiguration : Complete Registration : Complete UbuntuServerImage : Complete WindowsUpdates : Complete ServerCoreImage : Complete ServerFullImage : Complete MySQLGalleryItem : Complete SQLServerGalleryItem : Complete AddVMExtensions : Complete MySQLRP : Skipped SQLServerRP : Complete MySQLSKUQuota : Skipped SQLServerSKUQuota : Complete UploadScripts : Skipped MySQLDBVM : Skipped SQLServerDBVM : Complete MySQLAddHosting : Skipped SQLServerAddHosting : Complete AppServiceFileServer : Complete AppServiceSQLServer : Complete DownloadAppService : Complete AddAppServicePreReqs : Complete DeployAppService : Complete RegisterNewRPs : Complete CreatePlansOffers : Complete InstallHostApps : Complete CreateOutput : Complete

What is the App Service failure?

gijs007 commented 5 years ago

Thanks. I redid the deployment without a share and indeed everything works as expected.

I haven't had the opportunity to look at the App Service failure unfortunately.