hashicorp / packer

Packer is a tool for creating identical machine images for multiple platforms from a single source configuration.
http://www.packer.io
Other
15.02k stars 3.32k forks source link

[Azure] WinRM timeout with Windows 2016-Datacenter Marketplace Image #8658

Closed Dilergore closed 3 years ago

Dilergore commented 4 years ago

Please refer the end of this thread to see other users complaining that this is not working. https://github.com/MicrosoftDocs/azure-docs/issues/31188

Issue:

Started: December, 2019. Packer cannot connect with WinRM to machines provisioned from Windows 2016 (2016-Datacenter) Marketplace image in Azure.

Further details:

WinRM timeout increase is not working. It seems the last image working is version: "14393.3326.1911120150" (Released 12th of Nov). It stopped working with "14393.3384.1912042333" (Released 10th of Dec).

This issue is only impacting 2016-Datacenter. 2019 is working properly.

To get image Details for a Region:

az vm image list --location northeurope --offer WindowsServer --publisher MicrosoftWindowsServer --sku 2016-Datacenter --all

URL to the Last Working Image:

https://support.microsoft.com/en-us/help/4525236/windows-10-update-kb4525236

URL to the Image where something went wrong:

https://support.microsoft.com/en-us/help/4530689/windows-10-update-kb4530689

Notes:

This is currently applying to North EU. I had no time to investigate in other regions but I believe the same images getting distributed to every region.

I am opening a Microsoft case and planning to update the thread with the progress.

Dilergore commented 4 years ago

Interesting. It was definitely not working for quiet some time, but now I cannot reproduce this issue anymore. Even with the latest image and with the images between November and today it is working properly.

I will reopen in case I start to see this issue again.

AliAllomani commented 4 years ago

I still can reproduce the issue,

used image

   "image_publisher": "MicrosoftWindowsServer",
    "image_offer": "WindowsServer",
    "image_sku": "2016-Datacenter"

From initial troubleshooting it looks to me a certificate issue, trying to run winrm quickconfig on the machine during azure-arm: Waiting for WinRM to become available... resulting

WinRM service is already running on this machine.
WSManFault
    Message
        ProviderFault
            WSManFault
                Message = Cannot create a WinRM listener on HTTPS because this machine does not have an appropriate certificate. To be used for SSL, a certificate must have a CN matching the hostname, be appropriate for Server Authentication, and not be expired, revoked, or self-signed. 

Error number:  -2144108267 0x80338115
Cannot create a WinRM listener on HTTPS because this machine does not have an appropriate certificate. To be used for SSL, a certificate must have a CN matching the hostname, be appropriate for Server Authentication, and not be expired, revoked, or self-signed. 

And when trying to connect using openssl to retrieve the certificate i'm getting errno=54

openssl s_client -connect 13.95.122.54:5986 -showcerts
CONNECTED(00000003)
write:errno=54
---
no peer certificate available
---
No client certificate CA names sent
---
SSL handshake has read 0 bytes and written 307 bytes
---
New, (NONE), Cipher is (NONE)
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : 0000
    Session-ID:
    Session-ID-ctx:
    Master-Key:
    Key-Arg   : None
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    Start Time: 1580229460
    Timeout   : 300 (sec)
    Verify return code: 0 (ok)
---

Trying to re-generate self-signed certificate and reconfigure WinRM causing packer to immediately respond to the connection

$Cert = New-SelfSignedCertificate -CertstoreLocation Cert:\LocalMachine\My -DnsName "$env:COMPUTERNAME"
Remove-Item -Path WSMan:\Localhost\listener\listener* -Recurse
New-Item -Path WSMan:\LocalHost\Listener -Transport HTTPS -Address * -CertificateThumbPrint $Cert.Thumbprint -Force
Stop-Service winrm
Start-Service winrm

and from openssl showcerts i'm getting a correct answer

 openssl s_client -connect 13.95.122.54:5986 -showcerts
CONNECTED(00000003)
depth=0 CN = pkrvm39jkvjspuk
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0 CN = pkrvm39jkvjspuk
verify error:num=21:unable to verify the first certificate
verify return:1
---
Certificate chain
 0 s:/CN=pkrvm39jkvjspuk
   i:/CN=pkrvm39jkvjspuk
-----BEGIN CERTIFICATE-----
MIIDKjCCAhKgAwIBAgIQbI6Ll/YdLKZFm3XIDuCVEzANBgkqhkiG9w0BAQsFADAa
MRgwFgYDVQQDDA9wa3J2bTM5amt2anNwdWswHhcNMjAwMTI4MTYzNDI4WhcNMjEw
MTI4MTY1NDI4WjAaMRgwFgYDVQQDDA9wa3J2bTM5amt2anNwdWswggEiMA0GCSqG
SIb3DQEBAQUAA4IBDwAwggEKAoIBAQDTaBPCr8ImXt+wyDEcNVK3lW5HOme7X8h0
gl+ZTAmwhlzyZwWI1S5fW0Gfc+VQtwmscZs7in1/Rg0EBnhCHKiXYdJdWgiNQjp8
hxNHQlPzFMxBNHJCncs3cUjl8TBvWFVof+mNmv20IcoDfhkBXo8PBMC1M08krfGd
KXxvJ/Km3dfGvY3HKyMAdwJK/r4rENnTMIr5KgOv2cL4usTNS0o4nQSDVbL8rXdN
0Pfwui0ItGiZ7auul/tioQAmKpcxle7y16b/XnX1olQp59T7WklKcfS4Rt+XloAM
dyam22dhXaPQ9/03MBEqguO/SXDV2m+7RFLPRzHDPWwrQjE6eClDAgMBAAGjbDBq
MA4GA1UdDwEB/wQEAwIFoDAdBgNVHSUEFjAUBggrBgEFBQcDAgYIKwYBBQUHAwEw
GgYDVR0RBBMwEYIPcGtydm0zOWprdmpzcHVrMB0GA1UdDgQWBBQYK0o8mxc3uUyn
9WAvpOzINrvkyzANBgkqhkiG9w0BAQsFAAOCAQEALIRGvoQONxX0RzdyOEX15dJm
tMChjVgU9y176UK03NcuNqfQqJXhnibZQO/+ApXT4C1YKUzZcmqkJpPkt2ufYmC1
sFLp3tGZ35zfjtU8Mm6xEHdQv4LGQzpCycVqlvFGrdWCMCB4EWZb0z7oqp+nsz2P
14HFaiPsHnfpJEMUF+jrMQkGb9bzMHTT4Y0q5TStVdc9q1cu3pWLnzJ6gaBlz0Iz
DG03HtTmwppmDLSE1RZYJBQ6UsgD/L/jbR2c08ko4t1uSMwRcANv5sGZ6TukyK95
JVnYbFrZWzcqWfE1uynTEdeb+l/aospY9g/Fjt4WKI0U0xnGuczsbx1KoO0ELg==
-----END CERTIFICATE-----
---
Server certificate
subject=/CN=pkrvm39jkvjspuk
issuer=/CN=pkrvm39jkvjspuk
---
No client certificate CA names sent
Peer signing digest: SHA256
Server Temp Key: ECDH, P-256, 256 bits
---
SSL handshake has read 1298 bytes and written 433 bytes
---
New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-GCM-SHA384
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : ECDHE-RSA-AES256-GCM-SHA384
    Session-ID: 5E200000884A7231C92707E15CD2222B4BE94DD50A3B61E7B8763B3BC0A2F615
    Session-ID-ctx:
    Master-Key: 6CF4DA86AEBEB597F72DB9DC9E8C8B59D8B240C7FE6F8491B14314E86529A338F07E1B2C5BEB300C48DE4D490978D5D5
    Key-Arg   : None
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    Start Time: 1580229891
    Timeout   : 300 (sec)
    Verify return code: 21 (unable to verify the first certificate)
---

I see that packer is using azure osProfile.windowsConfiguration.winRM value in the template to configure winRM on the VM,

So here i would assume that either there is an issue with creating the certificate from packer side before uploading it to azure vault, or and issue with azure that prevents the VM from configuring winRM correctly using the values from the template, this may needs more troubleshooting.

 "osProfile": {
                    "computerName": "[parameters('virtualMachines_pkrvm2nb5asnu2s_name')]",
                    "adminUsername": "packer",
                    "windowsConfiguration": {
                        "provisionVMAgent": true,
                        "enableAutomaticUpdates": true,
                        "winRM": {
                            "listeners": [
                                {
                                    "protocol": "https",
                                    "certificateUrl": "https://pkrkv2nb5asnu2s.vault.azure.net/secrets/packerKeyVaultSecret/05113faa18ee40a2b5465910b2f3dda1"
                                }
                            ]
                        }
                    },
                    "secrets": [
                        {
                            "sourceVault": {
                                "id": "[parameters('vaults_pkrkv2nb5asnu2s_externalid')]"
                            },
                            "vaultCertificates": [
                                {
                                    "certificateUrl": "https://pkrkv2nb5asnu2s.vault.azure.net/secrets/packerKeyVaultSecret/05113faa18ee40a2b5465910b2f3dda1",
                                    "certificateStore": "My"
                                }
                            ]
                        }
                    ]
                },
Dilergore commented 4 years ago

@AliAllomani Okay... Which region are you deploying to? Few weeks ago I was thinking that this is an image related issue, I had no time to investigate further. Today I tried to use an older image and it started to work so I opened this, but then tried with the latest as well and it was also working. Don't know what is going on.

Can you try with older versions as well? Also in WestUS2? Let's try to rule these out...

Reopened this for now, but you are on your own, because it is now working for me....

AliAllomani commented 4 years ago

@Dilergore I'm deploying to EU West, also faced the timeout issue with the latest windows 2019-Datacenter image but not sure if it's the same issue, will do more tests from my side on different images.

Dilergore commented 4 years ago

@AliAllomani It was not happening for me with 2019. Usually it takes some time to configure WinRM by default. Using bigger machine, SSD, and increasing the timeout is usually working for this problem.

My setup is: Timeout: 20 min Premium SSD for OS Disk D4s_v3

In my experience even with this sometimes it takes longer than 5-6 minutes to configure it and connect to it.

AliAllomani commented 4 years ago

@Dilergore it seems intermittent,

The common thing i find out :

https://github.com/hashicorp/packer/blob/af2c4346f8454edb80fefd2fb28bc8b6a632eaa6/builder/azure/arm/config.go#L452

> Test-WSMan -ComputerName 52.142.198.26 -UseSSL
Test-WSMan : <f:WSManFault xmlns:f="http://schemas.microsoft.com/wbem/wsman/1/wsmanfault" Code="12175"
Machine="Bastion-UAT.wdprocessing.pvt"><f:Message>The server certificate on the destination computer
(52.142.198.26:5986) has the following errors:
Encountered an internal error in the SSL library.   </f:Message></f:WSManFault>
At line:1 char:1
+ Test-WSMan -ComputerName 52.142.198.26 -UseSSL
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (52.142.198.26:String) [Test-WSMan], InvalidOperationException
    + FullyQualifiedErrorId : WsManError,Microsoft.WSMan.Management.TestWSManCommand
A fatal error occurred while creating a TLS client credential. The internal error state is 10013.

<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> 
- <System> 
<Provider Name="Schannel" Guid="{1F678132-5938-4686-9FDC-C8FF68F15C85}" /> 
<EventID>36871</EventID> 
<Version>0</Version> 
<Level>2</Level> 
<Task>0</Task> 
<Opcode>0</Opcode> 
<Keywords>0x8000000000000000</Keywords> 
<TimeCreated SystemTime="2020-01-29T12:25:18.377000300Z" /> 
<EventRecordID>767</EventRecordID> 
<Correlation ActivityID="{80B997BA-F1CA-0000-01F5-7D5E9AD6D501}" /> 
<Execution ProcessID="632" ThreadID="2352" /> 
<Channel>System</Channel> 
<Computer>pkrvmudjx20x9lp</Computer> 
<Security UserID="S-1-5-18" /> 
</System> 
- <EventData> 
<Data Name="Type">client</Data> 
<Data Name="ErrorState">10013</Data> 
</EventData> 
</Event>

occurrence tests done so far ( All in EU West ) :

Standard_F8s_v2 -  SSD - win2019 - image version : latest

15:50:16  ==> azure-arm: Getting the VM's IP address ...
15:50:16  ==> azure-arm:  -> ResourceGroupName   : 'packer-Resource-Group-ibnzhmks0m'
15:50:16  ==> azure-arm:  -> PublicIPAddressName : 'pkripibnzhmks0m'
15:50:16  ==> azure-arm:  -> NicName             : 'pkrniibnzhmks0m'
15:50:16  ==> azure-arm:  -> Network Connection  : 'PublicEndpointInPrivateNetwork'
15:50:16  ==> azure-arm:  -> IP Address          : '40.68.191.187'
15:50:16  ==> azure-arm: Waiting for WinRM to become available...
15:50:16  ==> azure-arm: #< CLIXML
15:50:16      azure-arm: WinRM connected.

=======

Standard_F8s_v2 -  SSD - win2016 - image version : 14393.3326.1911120150
14:11:19  ==> azure-arm: Getting the VM's IP address ...
14:11:19  ==> azure-arm:  -> ResourceGroupName   : 'packer-Resource-Group-zhyjvoeajl'
14:11:19  ==> azure-arm:  -> PublicIPAddressName : 'pkripzhyjvoeajl'
14:11:19  ==> azure-arm:  -> NicName             : 'pkrnizhyjvoeajl'
14:11:19  ==> azure-arm:  -> Network Connection  : 'PublicEndpointInPrivateNetwork'
14:11:19  ==> azure-arm:  -> IP Address          : '52.174.178.101'
14:11:19  ==> azure-arm: Waiting for WinRM to become available...
14:20:40  ==> azure-arm: #< CLIXML
14:20:40      azure-arm: WinRM connected.

================
Standard_B2ms - HDD - win2016 - image version : latest
12:13:08  ==> azure-arm: Getting the VM's IP address ...
12:13:08  ==> azure-arm:  -> ResourceGroupName   : 'packer-Resource-Group-wt2ndevwlv'
12:13:08  ==> azure-arm:  -> PublicIPAddressName : 'pkripwt2ndevwlv'
12:13:08  ==> azure-arm:  -> NicName             : 'pkrniwt2ndevwlv'
12:13:08  ==> azure-arm:  -> Network Connection  : 'PublicEndpointInPrivateNetwork'
12:13:08  ==> azure-arm:  -> IP Address          : '52.148.254.62'
12:13:08  ==> azure-arm: Waiting for WinRM to become available...
12:43:00  ==> azure-arm: Timeout waiting for WinRM.
12:43:00  ==> azure-arm: 
12:43:00  ==> azure-arm: Cleanup requested, deleting resource group ...
12:49:52  ==> azure-arm: Resource group has been deleted.
12:49:52  Build 'azure-arm' errored: Timeout waiting for WinRM.
==============
Standard_D8s_v3 - HDD - win2016 - image version : latest

20:57:27  ==> azure-arm: Waiting for WinRM to become available...
21:06:19  ==> azure-arm: #< CLIXML
21:06:19      azure-arm: WinRM connected.
21:06:19  ==> azure-arm: <Objs Version="1.1.0.1" xmlns="http://schemas.microsoft.com/powershell/2004/04"><Obj S="progress" RefId="0"><TN RefId="0"><T>System.Management.Automation.PSCustomObject</T><T>System.Object</T></TN><MS><I64 N="SourceId">1</I64><PR N="Record"><AV>Preparing modules for first use.</AV><AI>0</AI><Nil /><PI>-1</PI><PC>-1</PC><T>Completed</T><SR>-1</SR><SD> </SD></PR></MS></Obj><Obj S="progress" RefId="1"><TNRef RefId="0" /><MS><I64 N="SourceId">1</I64><PR N="Record"><AV>Preparing modules for first use.</AV><AI>0</AI><Nil /><PI>-1</PI><PC>-1</PC><T>Completed</T><SR>-1</SR><SD> </SD></PR></MS></Obj></Objs>
21:06:19  ==> azure-arm: Connected to WinRM!
21:06:19  ==> azure-arm: Provisioning with Powershell...
===========
Standard_D8s_v3 - SSD - win2016 - image version : latest

21:17:12  ==> azure-arm: Getting the VM's IP address ...
21:17:12  ==> azure-arm:  -> ResourceGroupName   : 'packer-Resource-Group-vi2l6na2zy'
21:17:12  ==> azure-arm:  -> PublicIPAddressName : 'pkripvi2l6na2zy'
21:17:12  ==> azure-arm:  -> NicName             : 'pkrnivi2l6na2zy'
21:17:12  ==> azure-arm:  -> Network Connection  : 'PublicEndpointInPrivateNetwork'
21:17:12  ==> azure-arm:  -> IP Address          : '168.63.109.42'
21:17:12  ==> azure-arm: Waiting for WinRM to become available...
21:47:20  ==> azure-arm: Timeout waiting for WinRM.
21:47:20  ==> azure-arm: 
21:47:20  ==> azure-arm: Cleanup requested, deleting resource group ...
==============================
Standard_D8s_v3 - SSD - win2016 - image version : latest

11:51:06  ==> azure-arm: Getting the VM's IP address ...
11:51:06  ==> azure-arm:  -> ResourceGroupName   : 'packer-Resource-Group-ksei5ia6c6'
11:51:06  ==> azure-arm:  -> PublicIPAddressName : 'pkripksei5ia6c6'
11:51:06  ==> azure-arm:  -> NicName             : 'pkrniksei5ia6c6'
11:51:06  ==> azure-arm:  -> Network Connection  : 'PublicEndpointInPrivateNetwork'
11:51:06  ==> azure-arm:  -> IP Address          : '13.95.64.201'
11:51:06  ==> azure-arm: Waiting for WinRM to become available...
11:59:58      azure-arm: WinRM connected.
11:59:58  ==> azure-arm: #< CLIXML
==============================
Standard_D8s_v3 - SSD - win2016 - image version : 14393.3326.1911120150

21:56:07  ==> azure-arm: Getting the VM's IP address ...
21:56:07  ==> azure-arm:  -> ResourceGroupName   : 'packer-Resource-Group-6bz6fqr3js'
21:56:07  ==> azure-arm:  -> PublicIPAddressName : 'pkrip6bz6fqr3js'
21:56:07  ==> azure-arm:  -> NicName             : 'pkrni6bz6fqr3js'
21:56:07  ==> azure-arm:  -> Network Connection  : 'PublicEndpointInPrivateNetwork'
21:56:07  ==> azure-arm:  -> IP Address          : '104.46.40.255'
21:56:07  ==> azure-arm: Waiting for WinRM to become available...
22:03:43  ==> azure-arm: #< CLIXML
22:03:43      azure-arm: WinRM connected.
22:03:43  ==> azure-arm: <Objs Version="1.1.0.1" xmlns="http://schemas.microsoft.com/powershell/2004/04"><Obj S="progress" RefId="0"><TN RefId="0"><T>System.Management.Automation.PSCustomObject</T><T>System.Object</T></TN><MS><I64 N="SourceId">1</I64><PR N="Record"><AV>Preparing modules for first use.</AV><AI>0</AI><Nil /><PI>-1</PI><PC>-1</PC><T>Completed</T><SR>-1</SR><SD> </SD></PR></MS></Obj><Obj S="progress" RefId="1"><TNRef RefId="0" /><MS><I64 N="SourceId">1</I64><PR N="Record"><AV>Preparing modules for first use.</AV><AI>0</AI><Nil /><PI>-1</PI><PC>-1</PC><T>Completed</T><SR>-1</SR><SD> </SD></PR></MS></Obj></Objs>
22:03:43  ==> azure-arm: Connected to WinRM!
22:03:43  ==> azure-arm: Provisioning with Powershell...

=========

Standard_F8s_v2 -  HDD - win2019 - image version : latest

16:19:50  ==> azure-arm:  -> ResourceGroupName   : 'packer-Resource-Group-wwwgtctyip'
16:19:50  ==> azure-arm:  -> PublicIPAddressName : 'pkripwwwgtctyip'
16:19:50  ==> azure-arm:  -> NicName             : 'pkrniwwwgtctyip'
16:19:50  ==> azure-arm:  -> Network Connection  : 'PublicEndpointInPrivateNetwork'
16:19:50  ==> azure-arm:  -> IP Address          : '52.157.111.197'
16:19:50  ==> azure-arm: Waiting for WinRM to become available...
16:19:56  ==> azure-arm: #< CLIXML
16:19:56      azure-arm: WinRM connected.

========

Standard_B4ms - HDD - win2019 - image version : latest

16:03:00  ==> azure-arm:  -> DeploymentName    : 'pkrdp3ko5xlkk4n'
16:05:07  ==> azure-arm: Getting the VM's IP address ...
16:05:07  ==> azure-arm:  -> ResourceGroupName   : 'packer-Resource-Group-3ko5xlkk4n'
16:05:07  ==> azure-arm:  -> PublicIPAddressName : 'pkrip3ko5xlkk4n'
16:05:07  ==> azure-arm:  -> NicName             : 'pkrni3ko5xlkk4n'
16:05:07  ==> azure-arm:  -> Network Connection  : 'PublicEndpointInPrivateNetwork'
16:05:07  ==> azure-arm:  -> IP Address          : '52.166.196.146'
16:05:07  ==> azure-arm: Waiting for WinRM to become available...
16:34:59  ==> azure-arm: Timeout waiting for WinRM.
16:34:59  ==> azure-arm: 

Replacing the listener test :

Windows PowerShell
Copyright (C) 2016 Microsoft Corporation. All rights reserved.

PS C:\Users\packer> Test-WSMan -ComputerName localhost -UseSSL
Test-WSMan : <f:WSManFault xmlns:f="http://schemas.microsoft.com/wbem/wsman/1/wsmanfault" Code="12175"
Machine="pkrvmwawvo84vka"><f:Message>The server certificate on the destination computer (localhost:5986) has the
following errors:
Encountered an internal error in the SSL library.   </f:Message></f:WSManFault>
At line:1 char:1
+ Test-WSMan -ComputerName localhost -UseSSL
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (localhost:String) [Test-WSMan], InvalidOperationException
    + FullyQualifiedErrorId : WsManError,Microsoft.WSMan.Management.TestWSManCommand

PS C:\Users\packer> Get-ChildItem -path cert:\LocalMachine\My

   PSParentPath: Microsoft.PowerShell.Security\Certificate::LocalMachine\My

Thumbprint                                Subject
----------                                -------
8DDC5709AB990B6AC7F8D8CF1B97FC5FA136B9C0  CN=pkrvmwawvo84vka.cloudapp.net

PS C:\Users\packer> Remove-Item -Path WSMan:\Localhost\listener\listener* -Recurse
PS C:\Users\packer> Test-WSMan -ComputerName localhost -UseSSL
Test-WSMan : <f:WSManFault xmlns:f="http://schemas.microsoft.com/wbem/wsman/1/wsmanfault" Code="2150858770"
Machine="pkrvmwawvo84vka"><f:Message>The client cannot connect to the destination specified in the request. Verify
that the service on the destination is running and is accepting requests. Consult the logs and documentation for the
WS-Management service running on the destination, most commonly IIS or WinRM. If the destination is the WinRM service,
run the following command on the destination to analyze and configure the WinRM service: "winrm quickconfig".
</f:Message></f:WSManFault>
At line:1 char:1
+ Test-WSMan -ComputerName localhost -UseSSL
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (localhost:String) [Test-WSMan], InvalidOperationException
    + FullyQualifiedErrorId : WsManError,Microsoft.WSMan.Management.TestWSManCommand

PS C:\Users\packer> New-Item -Path WSMan:\LocalHost\Listener -Transport HTTPS -Address * -CertificateThumbPrint 8DDC5709AB990B6AC7F8D8CF1B97FC5FA136B9C0 -Force

   WSManConfig: Microsoft.WSMan.Management\WSMan::localhost\Listener

Type            Keys                                Name
----            ----                                ----
Container       {Transport=HTTPS, Address=*}        Listener_1305953032

PS C:\Users\packer> Test-WSMan -ComputerName localhost -UseSSL
Test-WSMan : <f:WSManFault xmlns:f="http://schemas.microsoft.com/wbem/wsman/1/wsmanfault" Code="12175"
Machine="pkrvmwawvo84vka"><f:Message>The server certificate on the destination computer (localhost:5986) has the
following errors:
The SSL certificate is signed by an unknown certificate authority.
The SSL certificate contains a common name (CN) that does not match the hostname.     </f:Message></f:WSManFault>
At line:1 char:1
+ Test-WSMan -ComputerName localhost -UseSSL
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (localhost:String) [Test-WSMan], InvalidOperationException
    + FullyQualifiedErrorId : WsManError,Microsoft.WSMan.Management.TestWSManCommand

PS C:\Users\packer>
adamrushuk commented 4 years ago

Just wanted to add I also get intermittent WinRM timeouts using both 2012-R2-Datacenter and 2016-Datacenter in UK South. It seems worse on the 2012-R2-Datacenter builds.

I was using smalldisk image variants, but changed to using the standard ones with more disk available following previous advice.

I've also increased the WinRM timeout to 1 hour, and increased VM size to Standard_D4s_v3, to no avail.

BruceShipman commented 4 years ago

I've been having the same issues in US West 2 for the last couple of days: 2019-Datacenter builds are fine, but 2016-Datacenter and 2012-R2-Datacenter ones intermittently fail to connect via WinRM, with 2012-R2 being the most problematic. Builds are done using smalldisk image, initially with D2sV3 vm_size and 20 minute winrm_timeout values. Increasing the VM size or timeout doesn't show any perceptible improvement.

Dilergore commented 4 years ago

I can fast track this with Microsoft but without the root cause... Also it totally seems working for me (for now), so I cannot even continue testing on my own. If you guys can find out what the issue is, I am happy to engage the support.

ghost commented 4 years ago

I just started running into this problem today. The last two weeks I've been building images to test out an automated process using Packer and did not have any issues with WinRM. I'm running Packer on the Azure DevOps Hosted Agent windows-2019 targeting resource groups in the South Central US region using the 2016-Datacenter image. I ran three builds today without issue and at 2pm EST the build started to fail for WinRM timeout reasons. I'm using a Standard_DS4_v2 size VM so it is highly unlikely to be a resource constraint issue. The way it is behaving, I'm leaning towards a networking related issue in the Azure data center. I'm running a few tests now to try and provide some more useful details.

AliAllomani commented 4 years ago

From the my tests findings i’d assume that something is going wrong within the os during the auto winrm ssl configuration by azure vm template

@Dilergore i think there is no way currently available by packer to configure the builder vm to use non-ssl winrm ?

Dilergore commented 4 years ago

From the my tests findings i’d assume that something is going wrong within the os during the auto winrm ssl configuration by azure vm template

@Dilergore i think there is no way currently available by packer to configure the builder vm to use non-ssl winrm ?

https://www.packer.io/docs/communicators/winrm.html#winrm-communicator-options

Never tried it tho...

AliAllomani commented 4 years ago

@Dilergore The available parameters are to define the method that the communicator use, however on the builder side i see it's hardcoded

https://github.com/hashicorp/packer/blob/df031db9daa3d9527a48fe3097d2d6003cb2ba57/builder/azure/common/template/template_builder.go#L90-L99

BruceShipman commented 4 years ago

And today, just to muddy the water a bit...

Yesterday evening's (1800 GMT-8) pipeline failed due to WinRM timeout on all three builds - 2012 R2, 2016, and 2019. This morning's run (0400) ran correctly. This is the first WinRM timeout I've seen using the 2019-Datacenter source. All three builds use smalldisk, DS3v2, 60m WinRM timeout.

In addition, afternoon/evening builds have a much higher incidence of failure than early morning ones.

tantra35 commented 4 years ago

We have the similar issue, but this is imho doesn't depend on particular windows image, and we think that this is the issue with azure platform itself. For our case a little workaround is to change instance type from Standard_DS2_v2 to Standard_B2ms and vice versa

nywilken commented 4 years ago

Hi Folks, thanks for keeping this thread up to date with your latest findings. I am looking into this issue on my end to see if there is any information that can help isolate what might be happening here. I too have observed that when using certain images connecting via winrm timeouts; changing my image to 2012-R2-Datacenter seems to work all the time within the westus region.

We have the similar issue, but this is imho doesn't depend on particular windows image, and we think that this is the issue with azure platform itself. For our case a little workaround is to change instance type from Standard_DS2_v2 to Standard_B2ms and vice versa

This is possible, but hard to tell with the information in the logs.

@Dilergore have you, or anyone on the thread, opened a support ticket with Azure around this particular issue?

Dilergore commented 4 years ago

@nywilken i will open it during the weekend. Will involve some people who can help us / can route the ticket inside Microsoft. If you want to contribute please send me your mail address privately.

Thanks!

chapter9 commented 4 years ago

As noted in the Packer Documentation - Getting started/Build an image

A quick aside/warning: Windows administrators in the know might be wondering why we haven't simply used a winrm quickconfig -q command in the script above, as this would automatically set up all of the required elements necessary for connecting over WinRM. Why all the extra effort to configure things manually? Well, long and short, use of the winrm quickconfig -q command can sometimes cause the Packer build to fail shortly after the WinRM connection is established. How?

  1. Among other things, as well as setting up the listener for WinRM, the quickconfig command also configures the firewall to allow management messages to be sent over HTTP.

  2. This undoes the previous command in the script that configured the firewall to prevent this access.

  3. The upshot is that the system is configured and ready to accept WinRM connections earlier than intended.

  4. If Packer establishes its WinRM connection immediately after execution of the 'winrm quickconfig -q' command, the later commands within the script that restart the WinRM service will unceremoniously pull the rug out from under the connection.

  5. While Packer does a lot to ensure the stability of its connection in to your instance, this sort of abuse can prove to be too much and may cause your Packer build to stall irrecoverably or fail!

Unfortunately, while this is true on AWS using the userdata script, I'm not sure how Azure builder configures WINRM and perhaps run winrm quickconfig -q while packer attempts to connect on Azure. If yes, then it might be the cause of this

Also, take note that Packer documentation - Communicator/WINRM stills refers to winrm quick config -q and many other repo files also mention winrm quick config -q which could affect other builders and direct the community into this issue.

A successful workaround that I am using on AWS is to use SSH communicator on windows 2016/2019 installing SSH using the userdata following installation instructions provided by Microsoft Documentation or Microsoft openssh portable. Not sure how this would translate for Azure.

AlexeyKarpushin commented 4 years ago

Hi All,

Is there any updates on this? I can't repro the issue for a couple of days, probably some fix was rolled out?

BruceShipman commented 4 years ago

@AlexeyKarpushin - I definitely still have the issue.

It seems to have an Azure load component. I still get WInRM timeouts on all three platforms (2012R2, 2016, 2019.) Failure rates on the three builds is 60% or more during weekday business hours, about 10-20% weeknights and weekend day, and very rare weekend overnight. In addition, the 2019 builds have a much lower failure rate than 2016 and 2012R2.

akingscote commented 4 years ago

getting this issue intermittently on windows 2019 now as well. I suspect it may be as others have said, something to do with the time of day. Seems to work in mornings/evenings but not in core hours

"winrm_use_ssl": true,
"winrm_insecure": true,
"winrm_timeout": "10m",
"winrm_username": "packer",

"location": "uksouth",
"vm_size": "Standard_DS2_v2"
nywilken commented 4 years ago

Hi folks, sorry for the slow response here. I have not been able to reproduce this issue since Friday although I do notice that 2016-Datacenter builds take longer than other Os versions to connect via WinRM. But I don't know why that is.

@nywilken i will open it during the weekend. Will involve some people who can help us / can route the ticket inside Microsoft. If you want to contribute please send me your mail address privately.

@Dilergore I don't have any new information to contribute so I'll refrain from reaching out privately. But thanks for offering to include me in the thread.

I definitely still have the issue.

For folks who are still able to reproduce the issue, when WinRM connectivity is timing out.

itzikbekel commented 4 years ago

same here, I was able to reproduce just now: "location": "East US", "image_offer": "WindowsServer", "image_sku": "2019-Datacenter", "communicator": "winrm", "winrm_use_ssl": true, "winrm_insecure": true, "winrm_timeout": "10m",

telnet to 5986 works, telnet to 5985 does not work.

10:02:18 ==> azure-arm: Waiting for WinRM to become available... 10:12:25 ==> azure-arm: Timeout waiting for WinRM. 10:12:25 ==> azure-arm: 10:12:25 ==> azure-arm: Cleanup requested, deleting resource group ... 10:12:25 ==> azure-arm: 10:12:25 ==> azure-arm: Not waiting for Resource Group delete as requested by user. Resource Group Name is packer-Resource-Group-z27ecnv9bw 10:12:25 Build 'azure-arm' errored: Timeout waiting for WinRM. 10:12:25
10:12:25 ==> Some builds didn't complete successfully and had errors: 10:12:25 --> azure-arm: Timeout waiting for WinRM.

AlexeyKarpushin commented 4 years ago

Hi All,

I've created a workaround which allows our Azure DevOps pipelines to run. It doesn't solve the problem but allows to ignore it.. I can't paste the whole code here, but I can make a short description, hopefully it will be useful. The main idea is to re-create WinRM listener on the temp machine during Packer build. Here are the steps:

  1. Enable Packer log: set $Env:PACKER_LOG=1 and $Env:PACKER_LOG_PATH='path to packer log'
  2. Create a simple parser which will analyze Packer log to find resource group name and temp VM name, and to detect WinRM issue. Error message in the log which indicates the issue is: "An existing connection was forcibly closed by the remote host" 2.1 If you'll find a better solution on how to find a resource group, please share it!
  3. If the issue is detected, execute Invoke-AzVMRunCommand against the temp machine. This code will reconfigure WinRM listener:
    $Cert = New-SelfSignedCertificate -CertstoreLocation Cert:\LocalMachine\My -DnsName "$env:COMPUTERNAME"
    Remove-Item -Path WSMan:\Localhost\listener\listener* -Recurse
    New-Item -Path WSMan:\LocalHost\Listener -Transport HTTPS -Address * -CertificateThumbPrint $Cert.Thumbprint -Force
  4. Run the resulting script as async powershell job before starting Packer build. Set some timeout before parsing the log to allow Packer to provision the temp machine, 10 min works fine for me. If you're doing it from YAML Azure DevOps pipeline, start the async job in the same step where Packer starts, otherwise the async job will be terminated by Azure DevOps client.

I hope the issue will be mitigated in the nearest future and this workaround will not be needed.

Kind regards, Alexey

AliAllomani commented 4 years ago

@nywilken you can take a look on my previous findings

https://github.com/hashicorp/packer/issues/8658#issuecomment-579784076

Dilergore commented 4 years ago

I have engaged Microsoft, as it seems I am experiencing this problem again...

ghost commented 4 years ago

@Dilergore I think the problem is on Microsoft's side. For the last three days I've been able to generate images without issue in the south central region using both 2016 and 2019 editions while it seems like others are still not able too.

Dilergore commented 4 years ago

Are you guys all trying to do this from Azure DevOps or you are using other tools?

itzikbekel commented 4 years ago

Are you guys all trying to do this from Azure DevOps or you are using other tools?

I'm using Jenkins and have the same issue...

ghost commented 4 years ago

Are you guys all trying to do this from Azure DevOps or you are using other tools?

I'm using Azure DevOps with one of the hosted agents.

BruceShipman commented 4 years ago

Are you guys all trying to do this from Azure DevOps or you are using other tools?

I'm using Azure DevOps with self repo.

Overnight run was good for all three Windows target OSs. First run at 0900 was fine, second run at 10:30 failed with WinRM timout on WS2016 and WN2012R2. WS2019 ran without error.

ghost commented 4 years ago

This could be a total coincidence but this has appeared to have worked the last three times I've tried this. Yesterday we lowered the VM size we are using for our Win2016 and Win2019 Packer builds. The Win2019 continues to run without issue but the Win2016 starts to incur a WinRM timeout. I try to repeat the build and wait about 15 to 20 minutes (WinRM timeout is set for 30 minutes) and notice that WinRM is still not responding. I run the Test-WSMan PowerShell cmdlet against the public IP of the machine and it fails as I expect it too. The odd part is with in minutes WinRM is suddenly responding on the VM and the Packer build finishes without issue.

danielsollondon commented 4 years ago

Hi All, I'm a Program Manager for the Azure VM image builder (which uses Packer under the hood), we have seen this too, and we have engaged the Windows PG to investigate. Initially there was a low memory condition which could cause problems, when using Standard D1_v2, this was due to Windows Update, and has been mitigated. However, there is still an issue, Windows PG is investigating, and I will report back when I hear. In the meantime one really kind member reached out with this workaround: https://github.com/danielsollondon/azvmimagebuilder/issues/14#issuecomment-577856888

SwampDragons commented 4 years ago

Wonderful!! Thanks so much for the update and the workaround, @danielsollondon

Dilergore commented 4 years ago

Hi All, I'm a Program Manager for the Azure VM image builder (which uses Packer under the hood), we have seen this too, and we have engaged the Windows PG to investigate. Initially there was a low memory condition which could cause problems, when using Standard D1_v2, this was due to Windows Update, and has been mitigated. However, there is still an issue, Windows PG is investigating, and I will report back when I hear. In the meantime one really kind member reached out with this workaround: danielsollondon/azvmimagebuilder#14 (comment)

Thanks Daniel! Also thanks to Corey C. who helped me to bring this to your attention! :-)

ifutant commented 4 years ago

I also encountered a similar WinRM timeout issue when attempting to deploy the following skus via Packer: MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter:4.127.20190603 MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter:4.127.20190521 MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter:4.127.20190416 I was able to get positive results by switching to the new hyper-v-generation V2 VM type. The WinRM timeouts didn't appear with the latest gensecond image. (Not a solution obviously - just an observation.) 2012-r2-datacenter-gensecond Pipeline: Azure Devops Packer Version: 1.5.1

The following Works reliably: "os_type": "Windows", "image_publisher": "MicrosoftWindowsServer", "image_offer": "WindowsServer", "image_sku": "2012-r2-datacenter-gensecond", "communicator": "winrm", "winrm_use_ssl": "true", "winrm_insecure": "true", "location": "West US2", "vm_size": "Standard_DS4_v2", "winrm_timeout": "40m",

The following config results in the WinRM timeout and it never recovers: "os_type": "Windows", "image_publisher": "MicrosoftWindowsServer", "image_offer": "WindowsServer", "image_sku": "2012-R2-Datacenter", "communicator": "winrm", "winrm_use_ssl": "true", "winrm_insecure": "true", "location": "West US2", "vm_size": "Standard_DS4_v2", "winrm_timeout": "40m",

nywilken commented 4 years ago

Hi Folks, thanks for your help and time in working to figure out what might be happening here. @Dilergore @danielsollondon thanks for pushing this forward and for the workaround. Looking forward to hearing back.

Also, its seems that the workaround involves a new cert generation step please let us know if there is anything we need to change on our end.

pmozbert commented 4 years ago

I just hit the winrm timeout on 2019-Datacenter The last time I did a build was on Jan 10, 2020, and at that time it worked ok.

==> azure-arm: Waiting for WinRM to become available...

==> azure-arm: Timeout waiting for WinRM.

==> azure-arm: 

==> azure-arm: Cleanup requested, deleting resource group ...

==> azure-arm: Resource group has been deleted.

Build 'azure-arm' errored: Timeout waiting for WinRM.

==> Some builds didn't complete successfully and had errors:

--> azure-arm: Timeout waiting for WinRM.

==> Builds finished but no artifacts were created.

I'm using

"image_publisher": "MicrosoftWindowsServer",
"image_offer": "WindowsServer",
"image_sku": "2019-Datacenter",

A query of versions of this image turns up these, and it looks like nothing has been published in 2020, so I'd think that I'm getting the same version as before.

Version              FilterExpression Skus
-------              ---------------- ----
17763.557.1907191810                  2019-Datacenter
17763.557.20190604                    2019-Datacenter
17763.615.1907121548                  2019-Datacenter
17763.678.1908092216                  2019-Datacenter
17763.737.1909062324                  2019-Datacenter
17763.805.1910061628                  2019-Datacenter
17763.864.1911120152                  2019-Datacenter
17763.914.1912042330                  2019-Datacenter
17763.973.2001110547                  2019-Datacenter
2019.0.20181107                       2019-Datacenter
2019.0.20181122                       2019-Datacenter
2019.0.20181218                       2019-Datacenter
2019.0.20190115                       2019-Datacenter
2019.0.20190214                       2019-Datacenter
2019.0.20190314                       2019-Datacenter
2019.0.20190410                       2019-Datacenter
2019.0.20190603                       2019-Datacenter
pmozbert commented 4 years ago

I changed the VM size from Standard_DS2_v2 to Standard_DS3_v2 and the build ran OK. Not sure if this proves that memory could be an issue, or if I just got lucky.

BruceShipman commented 4 years ago

@pmozbert - I just tried several 2016-Datacenter builds using Standard_DS3_v2, and it hung on "Waiting for WinRM to become available..." as usual. The workaround brought up by @Dilergore yesterday works very well, but unfortunately isn't really practical for an automated process.

azsec commented 4 years ago

Set the following will solve the issue. WinRM with SSL now can only work well (and stable) with a domain controller environment (either Kerberos or NTLMv2).

"winrm_use_ssl": false,

This is not a best practice but that is what I've experienced when working with WinRM. Microsoft should really improve this protocol.

BruceShipman commented 4 years ago

I attempted the fix posted by @azsec on 2016_Datacenter builds: it doesn't appear to do any good in our environment: I still got interminable WinRM timeouts on 8 consecutive attempts. Overnight automated build ran fine with SSL enabled, and the first manual build this morning was successful, but three subsequent ones failed, as did all the ones with SSL disabled.

danielsollondon commented 4 years ago

Quick update, I spoke to the Windows team, they have identified an issue with the Windows Server 2016 image (November onwards), that impacts the time to initiate a WinRM connection with Packer, they are still working on this, and will update again mid next week. In the meantime please try increasing the Packer timeout to 30mins, and try a larger VM size.

prometheusaiolos commented 4 years ago

It seems that the WINRM connection is available in about 5 mins post instance creation, in my case I can clearly see that the connection is available:

nc -z -w1 x.x.x.x 5986;echo $? Connection to x.x.x.x port 5986 [tcp/wsmans] succeeded!

However, Packer :

==> azure-arm: Getting the VM's IP address ...
==> azure-arm:  -> ResourceGroupName   : 'packer-Resource-Group-xyz'
==> azure-arm:  -> PublicIPAddressName : 'xyz'
==> azure-arm:  -> NicName             : 'xyz'
==> azure-arm:  -> Network Connection  : 'PublicEndpoint'
==> azure-arm:  -> IP Address          : 'x.x.x.x '
==> azure-arm: Waiting for WinRM to become available...

I am using windows 2016 in this case & size is Standard_B2ms

And in about 11 mins approx. i see it gets connected:

==> azure-arm: #< CLIXML
    azure-arm: WinRM connected.
==> azure-arm: <Objs Version="1.1.0.1" xmlns="http://schemas.microsoft.com/powershell/2004/04"><Obj S="progress" RefId="0"><TN RefId="0"><T>System.Management.Automation.PSCustomObject</T><T>System.Object</T></TN><MS><I64 N="SourceId">1</I64><PR N="Record"><AV>Preparing modules for first use.</AV><AI>0</AI><Nil /><PI>-1</PI><PC>-1</PC><T>Completed</T><SR>-1</SR><SD> </SD></PR></MS></Obj><Obj S="progress" RefId="1"><TNRef RefId="0" /><MS><I64 N="SourceId">1</I64><PR N="Record"><AV>Preparing modules for first use.</AV><AI>0</AI><Nil /><PI>-1</PI><PC>-1</PC><T>Completed</T><SR>-1</SR><SD> </SD></PR></MS></Obj></Objs>
==> azure-arm: Connected to WinRM!

Not sure why there packer doesn't detect the connection availability a bit earlier, Anyway it works fine 2016-Datacenter without an issue, just seems like there is a lag between the time when winrm is available and packer detects the availability.

I have the timeout set to 30 mins anyway as advised by @danielsollondon.

shurick81 commented 4 years ago

Do I have related isssue or is it separate, if that is what i run and get?

{
    "builders": [
        {
            "type": "azure-arm",

            "client_id": "{{user `client_id`}}",
            "client_secret": "{{user `client_secret`}}",
            "subscription_id": "{{user `subscription_id`}}",
            "tenant_id": "{{user `tenant_id`}}",

            "managed_image_resource_group_name": "contoso-sharepoint-prod-common-rg",
            "managed_image_name": "contoso-sharepoint-prod-common-{{user `box_name`}}",

            "os_type": "Windows",
            "image_publisher": "MicrosoftWindowsServer",
            "image_offer": "WindowsServer",
            "image_sku": "2019-Datacenter",
            "image_version": "latest",

            "communicator": "winrm",
            "winrm_use_ssl": "false",
            "winrm_insecure": "true",
            "winrm_timeout": "30m",
            "winrm_username": "packer",

            "vm_size": "Standard_F2s",
            "managed_image_storage_account_type": "Premium_LRS",

            "build_resource_group_name": "contoso-sharepoint-prod-common-rg",
            "temp_compute_name": "shrpt{{timestamp}}"
        }
    ],
    "provisioners": [
        { "type": "windows-restart" }
    ],
    "variables": {
        "box_name": "win-oos{{env `vm_image_name_suffix`}}",
        "client_id": "{{env `ARM_CLIENT_ID`}}",
        "client_secret": "{{env `ARM_CLIENT_SECRET`}}",
        "subscription_id": "{{env `ARM_SUBSCRIPTION_ID`}}",
        "tenant_id": "{{env `ARM_TENANT_ID`}}"
    }
}
PS C:\Users\01sodfin\Documents> packer build -only azure-arm "win-sp.json"
azure-arm: output will be in this color.

==> azure-arm: Running builder ...
==> azure-arm: Getting tokens using client secret
==> azure-arm: Getting tokens using client secret
    azure-arm: Creating Azure Resource Manager (ARM) client ...
==> azure-arm: Using existing resource group ...
==> azure-arm:  -> ResourceGroupName : 'contoso-sharepoint-prod-common-rg'
==> azure-arm:  -> Location          : 'westeurope'
==> azure-arm: Validating deployment template ...
==> azure-arm:  -> ResourceGroupName : 'contoso-sharepoint-prod-common-rg'
==> azure-arm:  -> DeploymentName    : 'pkrdpe380nb7t2s'
==> azure-arm: Deploying deployment template ...
==> azure-arm:  -> ResourceGroupName : 'contoso-sharepoint-prod-common-rg'
==> azure-arm:  -> DeploymentName    : 'kvpkrdpe380nb7t2s'
==> azure-arm: Getting the certificate's URL ...
==> azure-arm:  -> Key Vault Name        : 'pkrkve380nb7t2s'
==> azure-arm:  -> Key Vault Secret Name : 'packerKeyVaultSecret'
==> azure-arm:  -> Certificate URL       : 'https://pkrkve380nb7t2s.vault.azure.net/secrets/packerKeyVaultSecret/f395e80640be4dbcaf47b508c4ef864b'
==> azure-arm: Setting the certificate's URL ...
==> azure-arm: Validating deployment template ...
==> azure-arm:  -> ResourceGroupName : 'contoso-sharepoint-prod-common-rg'
==> azure-arm:  -> DeploymentName    : 'pkrdpe380nb7t2s'
==> azure-arm: Deploying deployment template ...
==> azure-arm:  -> ResourceGroupName : 'contoso-sharepoint-prod-common-rg'
==> azure-arm:  -> DeploymentName    : 'pkrdpe380nb7t2s'
==> azure-arm: Getting the VM's IP address ...
==> azure-arm:  -> ResourceGroupName   : 'contoso-sharepoint-prod-common-rg'
==> azure-arm:  -> PublicIPAddressName : 'pkripe380nb7t2s'
==> azure-arm:  -> NicName             : 'pkrnie380nb7t2s'
==> azure-arm:  -> Network Connection  : 'PublicEndpoint'
==> azure-arm:  -> IP Address          : '104.40.139.200'
==> azure-arm: Waiting for WinRM to become available...
==> azure-arm: Timeout waiting for WinRM.
==> azure-arm:
==> azure-arm: The resource group was not created by Packer, deleting individual resources ...
==> azure-arm:  -> Deployment: pkrdpe380nb7t2s
==> azure-arm:  -> Microsoft.Compute/virtualMachines : 'shrpt1581359966'
==> azure-arm:  -> Microsoft.Network/networkInterfaces : 'pkrnie380nb7t2s'
==> azure-arm:  -> Microsoft.Network/virtualNetworks : 'pkrvne380nb7t2s'
==> azure-arm:  -> Microsoft.Network/publicIPAddresses : 'pkripe380nb7t2s'
==> azure-arm:  -> Microsoft.Compute/disks : '/subscriptions/58baf6a1-d140-4b25-8ed1-b3195bbf2c7c/resourceGroups/CONTOSO-SHAREPOINT-PROD-COMMON-RG/providers/Microsoft.Compute/disks/pkrose380nb7t2s'
==> azure-arm:
==> azure-arm: The resource group was not created by Packer, deleting individual resources ...
==> azure-arm: Could not retrieve OS Image details
==> azure-arm:  -> Deployment: kvpkrdpe380nb7t2s
==> azure-arm:  -> Microsoft.KeyVault/vaults/secrets : 'pkrkve380nb7t2s/packerKeyVaultSecret'
==> azure-arm:  -> Microsoft.KeyVault/vaults : 'pkrkve380nb7t2s'
==> azure-arm:  ->  : ''
==> azure-arm: Error deleting resource.  Please delete manually.
==> azure-arm:
==> azure-arm: Name:
==> azure-arm: Error: Unable to parse path of image
==> azure-arm:
==> azure-arm: The resource group was not created by Packer, not deleting ...
Build 'azure-arm' errored: Timeout waiting for WinRM.

==> Some builds didn't complete successfully and had errors:
--> azure-arm: Timeout waiting for WinRM.

==> Builds finished but no artifacts were created.
shurick81 commented 4 years ago

Works if I use previous image version:

"image_sku": "2019-Datacenter",
"image_version": "17763.914.1912042330"
BruceShipman commented 4 years ago

In the meantime please try increasing the Packer timeout to 30mins, and try a larger VM size.

2016-Datacenter using VM size of Standard_D2s_v3 and a timeout of 30 minutes still results in WinRM timeout failure in West US 2 today - I'm seeing about 30% failure rate in the morning/early afternoon (UTC-0700) and 60% or more later in the afternoon. Overnight (0300) build seems to be fine. Using a larger VM or increasing the winrm_timeout value to 60m doesn't seem to have any helpful effect.

I've been concentrating on 2016-Datacenter today. The few 2019 and 2012R2 runs have been fine today, but were problematical last Friday. 2016 had 100% failure rate during the day on Friday unless I used the manual workaround @Dilergore posted last week.

rustyautopsy commented 4 years ago

Our 2019-Datacenter builds have 100% failure rate regardless of setting timeouts or adjusting VM sizes. These failures were experienced consistently during "business hours" in the East US region.

For reference, these are our settings...

""vm_size": "Standard_DS2_v2",
"image_publisher": "MicrosoftWindowsServer",
"image_offer": "WindowsServer",
"image_sku": "2019-Datacenter",
"image_version": "17763.973.2001110547" <-- January 2020 build
"winrm_use_ssl": "true",
"winrm_insecure": "true",
"winrm_port": "5986",
"winrm_timeout": "30m",

Running the same builds during after hours in the same region resulted in 100% success using a variety of different versions as provided via the following MS release notes page...

https://support.microsoft.com/en-us/help/4537134

We tested the following versions June 2019, November 2019, December 2019, January 2020, every build worked.

How can the timing of day affect this? 🤷‍♂️

We will run the same tests with the versions mentioned above during "business hours" and report back here in the AM.

rustyautopsy commented 4 years ago

Our 2019-Datacenter builds have 100% failure rate regardless of setting timeouts or adjusting VM sizes. These failures were experienced consistently during "business hours" in the East US region.

For reference, these are our settings...

""vm_size": "Standard_DS2_v2",
"image_publisher": "MicrosoftWindowsServer",
"image_offer": "WindowsServer",
"image_sku": "2019-Datacenter",
"image_version": "17763.973.2001110547" <-- January 2020 build
"winrm_use_ssl": "true",
"winrm_insecure": "true",
"winrm_port": "5986",
"winrm_timeout": "30m",

Running the same builds during after hours in the same region resulted in 100% success using a variety of different versions as provided via the following MS release notes page...

https://support.microsoft.com/en-us/help/4537134

We tested the following versions June 2019, November 2019, December 2019, January 2020, every build worked.

How can the timing of day affect this? 🤷‍♂️

We will run the same tests with the versions mentioned above during "business hours" and report back here in the AM.

Today's builds are completing without issue. For reference, these are our image settings...

IMAGE_PUBLISHER: "MicrosoftWindowsServer"
IMAGE_OFFER: "WindowsServer"
IMAGE_SKU: "2019-Datacenter"
IMAGE_VERSION: "17763.973.2001110547"
VM_SIZE: "Standard_DS2_v2"

IMAGE_PUBLISHER: "MicrosoftWindowsDesktop"
IMAGE_OFFER: "Windows-10"
IMAGE_SKU: "19h1-pro"
IMAGE_VERSION: "18362.592.2001092016"
VM_SIZE: "Standard_DS2_v2"

I hope an explanation comes down from MS so we don't have to accept the "it works on my machine" response. 🤞

Dilergore commented 4 years ago

Hello Everyone,

Here is the latest information I've got from Microsoft: “Windows Server 2016 images since November 2019 can have a post first boot performance issue related to an OS code integrity operation. This issue is more pronounced on small Azure VM sizes (with lower throughput and IO) rendering the VM not immediately usable after first boot. The performance issue is mitigated in February 2020 images and forward. Please use the latest February Windows Server 2016 image once it is available from the Marketplace (ETA 2/17).”