mattmcspirit / azurestack

Azure Stack Resources
80 stars 41 forks source link

Script Failure for AppService and MSSQL - How to reset Complete flag? #119

Closed SaberSHO closed 4 years ago

SaberSHO commented 4 years ago

I ran into repeated issues installing the AppService and MSSQL portion (MYSql installed fine). I have re-run the script a few times, same failures. I wanted to clean out the environment so i removed the resource groups for appservice-sql, appservice-vm, sql.net.localadapter. How do I go about clearing the "complete" flag for the prequisites that I now need to reinstall? I see references to a localdb that I may need to edit, but cannot find instructions on doing so.

Thanks for your help and a great script!

Current Progress:

GetScripts : Complete CheckPowerShell : Complete InstallPowerShell : Complete DownloadTools : Complete CheckCerts : Skipped HostConfiguration : Complete Registration : Skipped AdminPlanOffer : Complete UbuntuServerImage : Complete WindowsUpdates : Complete ServerCore2016Image : Complete ServerFull2016Image : Complete ServerCore2019Image : Complete ServerFull2019Image : Complete MySQL57GalleryItem : Complete MySQL80GalleryItem : Complete SQLServerGalleryItem : Complete AddVMExtensions : Skipped MySQLRP : Complete SQLServerRP : Complete MySQLSKUQuota : Complete SQLServerSKUQuota : Complete UploadScripts : Skipped MySQLDBVM : Complete SQLServerDBVM : Failed MySQLAddHosting : Complete SQLServerAddHosting : Failed AppServiceFileServer : Complete AppServiceSQLServer : Complete DownloadAppService : Complete AddAppServicePreReqs : Complete DeployAppService : Failed RegisterNewRPs : Incomplete UserPlanOffer : Incomplete InstallHostApps : Incomplete CreateOutput : Incomplete

** DO NOT CLOSE THIS SESSION - If you do, please run .\GetJobStatus.ps1 from within C:\AzSPoC\Scripts to resume job monitoring ** ** Please wait until all jobs have completed/failed before re-running the main script ** At least one of the jobs failed. FAILED JOB: Job Name: DeploySQLServerHost | Error Message: Deploying the SQLServer VM failed after 3 attempts. Exiting process. FAILED JOB: Job Name: AddSQLHosting | Error Message: The SQLServerDBVM stage of the process has failed. This should fully complete before the SQLServer database host has been deployed. Check the SQLServerDBVM log, ensure that step is completed first, and rerun. FAILED JOB: Job Name: DeployAppService | Error Message: Deploying the App Service Resource Provider failed after 3 attempts. Check the logs and rerun the script [3:41 PM]::[LAUNCHJOBS]:: Please review the logs for further troubleshooting.Exception.Message PS C:\AzSPoC>

mattmcspirit commented 4 years ago

Hi,

What is strange here is that "AppServiceSQLServer" : Complete was fine, but "SQLServerDBVM : Failed" and they are essentially the same process for deployment, so perhaps there was a random, transient issue/conflict that caused the SQLServerDBVM to fail.

Before resetting the DB, please can you share the log file from the SQLServerDBVM stage (usually found in the logs folder, which is in the same folder that you ran your AzSPoC.ps1 file from, i.e. C:\AzsPoC\Logs.

Also, please share the App Service log. This will be in your -downloadPath\AppService i think. It will be called AppServiceLog.

If you could email those to asdkconfigurator @ outlook .com - that way, i can see if there's something that would cause an issue for your future runs of the script.

Also, if you can provide info about your hardware, that would be helpful.

Thanks! Matt

SaberSHO commented 4 years ago

I just sent you the logs. I am currently running this in a HyperV VM (Nested), with 24 cores and 192GB of ram assigned.

How do I got about resetting the DB to try again? I think this may have been a transient issue as I was having problems with the MySQL part on the first run, but that one did actually finish and is working after rerunning the script. I would just like to try again with skipping MySQL and just running the MSSQL and AppService portion.

mattmcspirit commented 4 years ago

Thanks for sending through.

This Hyper-V VM - aside from the 24 cores and 192GB RAM - tell me more about the storage you're using, both VHD(x) config (# of VHDs, size, dynamic/fixed etc), and the underlying storage in the physical system, please? i..e HDD/SSD mix

I looked at the logs, and the SQL DB VM failed due to this error:

PS>TerminatingError(New-AzureRmResourceGroupDeployment): "Cannot validate argument on parameter 'TemplateUri'. The argument is null or empty. Provide an argument that is not null or empty, and then try the command again." Deployment failed.

This would be caused by there being not the correct MSSQL Gallery Item in the environment (which itself, could have been a transient issue.

Could you run this in a fresh PS window? If we don't confirm/fix this, it won't work again when you run in the future:

$creds = Get-Credential
Add-AzureRMEnvironment -Name "AzureStackAdmin" -ArmEndpoint https://adminmanagement.local.azurestack.external
Add-AzureRmAccount -EnvironmentName "AzureStackAdmin" -Credential $creds
$azpkg = "MSSQL"
$mainTemplateURI = $(Get-AzsGalleryItem | Where-Object { $_.Name -like "AzureStackPOC.$azpkg*" }).DefinitionTemplates.DeploymentTemplateFileUris.Values | Where-Object { $_ -like "*mainTemplate.json" }
$mainTemplateURI

Hopefully, $mainTemplateURI returns something like:

https://systemgallery.blob.local.azurestack.external/dev20161101-microsoft-windowsazure-gallery/AzureStackPOC.MSSQL.1.0.0/DeploymentTemplates/mainTemplate.json

As for the App Service, If failed the first time because of a failure to deploy/configure the VM Extension on CN-0 - i've seen this happen commonly on lower powered hardware, most specifically, where there are no SSDs/insufficient IOPS.

Then, the subsequent failures are due to, i suspect, your cleanup - my script knows how to clean up the App Service Database to retry the process, but with you deleting certain RGs, i suspect that may be having an effect.

Let me know the answers to the hardware and SQL Template and i'll dig out how to reset the stages in the DB.

Thanks

SaberSHO commented 4 years ago

Matt,

Thanks for all your help on this.

The storage is setup as 4x VHDX of 750GB each, Dynamic. These reside on a 8 drive HDD (SATA 7200) Virtual Disk in RAID 10. IOPS arent great, but this is all I have available at the moment.

Running the script provided, $mainTemplateURI is not returning a value.

I suspect that you are correct and I have tinkered with this too much and have caused it to be in a completely inconsistent state. If the answer is to redploy my ASDK, I can live with that :) I don't want to take up your valuable time digging into this if it is not a simple thing to just reset the DB to start the run again.

Thanks

mattmcspirit commented 4 years ago

Hi,

IOPS could be your issue here, and combined with running Dynamic VHDx rather than fixed, this will also incur a perf penalty, so likely that is the reason the App Service is struggling. I do have a flag in my documentation for lower performing hardware (-serialMode) but this won't help you so much when it comes to the App Service, as that is a deployment that's fixed/handled by the App Service installer and something I can't change.

Please can you confirm exactly which RGs you deleted?

appservice-infra? -> This is where the App Service installer creates it's VMs appservice-sql? -> This is where i deploy the SQL Server for the App Service appservice-fileshare? -> This is where i deploy the file server for the App Service system.local.sqladapter? -> This is created by the SQL Resource Provider

The last one should not have been deleted, as this was created by the SQL RP installer, and this finished successfully, so that in itself my cause an issue.

So, let's try this anyway.

$sqlServerInstance = '(localdb)\MSSQLLocalDB'
$databaseName = "AzSPoC"
$tableName = "Progress"
$progressStage = "SQLServerGalleryItem"

Invoke-Sqlcmd -Server $sqlServerInstance -Query "USE $databaseName UPDATE Progress SET $progressStage = 'Incomplete';" -Verbose:$false -ErrorAction Stop
Read-SqlTableData -ServerInstance $sqlServerInstance -DatabaseName "$databaseName" -SchemaName "dbo" -TableName "$tableName" -ErrorAction SilentlyContinue -Verbose:$false

# Then change $progressStage for the following
$progressStage = "SQLServerDBVM"
$progressStage = "AppServiceFileServer" # Assuming you deleted the RG "appservice-fileshare" ? if not, don't run this one.
$progressStage = "AppServiceSQLServer" # Assuming you deleted the RG "appservice-sql" ? if not, don't run this one.
$progressStage = "SQLServerRP" # Assuming you deleted the RG "system.local.sqladapter" ? If not, don't run this one
$progressStage = "DeployAppService"

So, run the first 6 lines, then swap the $progressStage line for each of the stages you want to reset. Based on what you've described, I think these should be the ones you reset.

Once you've done that, send me the output of this BEFORE you rerun the script

Read-SqlTableData -ServerInstance $sqlServerInstance -DatabaseName "$databaseName" -SchemaName "dbo" -TableName "$tableName" -ErrorAction SilentlyContinue -Verbose:$false

Thanks, Matt

SaberSHO commented 4 years ago

PS C:\Users\AzureStackAdmin> Read-SqlTableData -ServerInstance $sqlServerInstance -DatabaseName "$databaseName" -SchemaName "dbo" -TableName "$tableName" -ErrorAction SilentlyContinue -Verbose:$false

GetScripts : Complete CheckPowerShell : Complete InstallPowerShell : Complete DownloadTools : Complete CheckCerts : Skipped HostConfiguration : Complete Registration : Skipped AdminPlanOffer : Complete UbuntuServerImage : Complete WindowsUpdates : Complete ServerCore2016Image : Complete ServerFull2016Image : Complete ServerCore2019Image : Complete ServerFull2019Image : Complete MySQL57GalleryItem : Complete MySQL80GalleryItem : Complete SQLServerGalleryItem : Incomplete AddVMExtensions : Skipped MySQLRP : Complete SQLServerRP : Incomplete MySQLSKUQuota : Complete SQLServerSKUQuota : Incomplete UploadScripts : Skipped MySQLDBVM : Complete SQLServerDBVM : Incomplete MySQLAddHosting : Complete SQLServerAddHosting : Incomplete AppServiceFileServer : Incomplete AppServiceSQLServer : Incomplete DownloadAppService : Incomplete AddAppServicePreReqs : Incomplete DeployAppService : Incomplete RegisterNewRPs : Incomplete UserPlanOffer : Incomplete InstallHostApps : Incomplete CreateOutput : Incomplete

SaberSHO commented 4 years ago

Also, as far as the RGs i had deleted it was appservice-sql, appservice-fileshare, system.local.sqladapter

I do not believe it had created the appservice-infra group, but I could have been remembering wrong.

mattmcspirit commented 4 years ago

OK great - so, please check in the admin portal, and if there is appservice-infra, or anything else appservice-whatever, please delete that RG.

Secondly, i would suggest adding -serialMode to your launch command, so close all PS windows, fire up a fresh PS ISE, enter your launch command and add -serialMode - this will ensure that the SQL Server DB VM, the App Service DB VM, the File Server DB VM and the SQL RP VM don't all start at the same time, along with the App Service installer. It will take longer, but increases the chances for success. The number of IOPS you have is already lower due to the existing VMs running on the system, plus, these ones you're looking to deploy. Using -serialMode will just give you a better chance for success.

I see that you also reset "AddAppServicePreReqs" - this wasn't necessary, as that had completed successfully, so this may cause additional issues but it may be OK.

Can you confirm, have you also deleted the local -downloadPath\AppService folder? If not, don't delete it - it will be fine.

SaberSHO commented 4 years ago

Matt, thank you for your assistance. The script completed successfully and everything seems to be working ok.

mattmcspirit commented 4 years ago

Thanks - did you use the -serialMode?

Thanks!