test-kitchen / winrm-transport

WinRM transport logic for re-using remote shells and uploading files
Other
8 stars 9 forks source link

Illegal operation attempted on a registry key that has been marked for deletion. #18

Open dpetzel opened 8 years ago

dpetzel commented 8 years ago

This is an odd one so bear with me... The Environment:

The Error:

       Transferring files to <mysuite-win2k8-chef-11>

           Directory: C:\Users\CLOUD-~1\AppData\Local\Temp

       Mode                LastWriteTime     Length Name                              
       ----                -------------     ------ ----                              
       d----        10/31/2015   9:37 AM            kitchen                           

       Transferring files to <mysuite-win2k8-chef-12>
-----> Destroying <mysuite-win2k8-chef-11>...
       OpenStack instance <GUID> destroyed.
       Finished destroying <mysuite-win2k8-chef-11> (0m0.56s).
>>>>>> ------Exception-------
>>>>>> Class: Kitchen::ActionFailed
>>>>>> Message: Failed to complete #converge action: [[WSMAN ERROR CODE: 2147943418]: <f:WSManFault Code='2147943418' Machine='10.42.10.140' xmlns:f='http://schemas.microsoft.com/wbem/wsman/1/wsmanfault'><f:Message><f:ProviderFault path='%systemroot%\system32\winrscmd.dll' provider='Shell cmd plugin'>Illegal operation attempted on a registry key that has been marked for deletion. </f:ProviderFault></f:Message></f:WSManFault>]

Here is what it looks like under debug logging:

D, [2015-10-31T09:59:48.838747 #7109] DEBUG -- mysuite-win2k8-chef-11: [FileTransporter] Uploading /tmp/tmpzip-20151031-7109-6km2vo.zip to encoded tmpfile $env:TEMP\b64-76cdb2bad9582d23c1f6f4d868218d6c.txt
D, [2015-10-31T09:59:48.838962 #7109] DEBUG -- mysuite-win2k8-chef-11: [CommandExecutor] Resetting WinRM shell (Max command limit is 13)
D, [2015-10-31T09:59:48.896679 #7109] DEBUG -- mysuite-win2k8-chef-11: Cleaning up local sandbox in /tmp/mysuite-win2k8-chef-11-sandbox-20151031-7109-fmurqb
E, [2015-10-31T09:59:48.932441 #7109] ERROR -- mysuite-win2k8-chef-11: Converge failed on instance <mysuite-win2k8-chef-11>.
E, [2015-10-31T09:59:48.932706 #7109] ERROR -- mysuite-win2k8-chef-11: ------Exception-------
E, [2015-10-31T09:59:48.932761 #7109] ERROR -- mysuite-win2k8-chef-11: Class: WinRM::WinRMWSManFault
E, [2015-10-31T09:59:48.932802 #7109] ERROR -- mysuite-win2k8-chef-11: Message: [WSMAN ERROR CODE: 2147943418]: <f:WSManFault Code='2147943418' Machine='10.42.10.136' xmlns:f='http://schemas.microsoft.com/wbem/wsman/1/wsmanfault'><f:Message><f:ProviderFault path='%systemroot%\system32\winrscmd.dll' provider='Shell cmd plugin'>Illegal operation attempted on a registry key that has been marked for deletion. </f:ProviderFault></f:Message></f:WSManFault>

I'll admit this is my first time looking at this code, but I believe I've narrowed this down to the open method. It seems like Windows may not be done unloading some portion of the registry tied to the users session (this is a theory, not fact) before creating the next session. As a test I added a sleep(1) after the close to delay before opening a new session. I suspect the reason my remote client doesn't encounter the error is that the WAN latency causes just enough delay (again just theory).

For the record, I don't believe the sleep is "the fix", but it has allowed me to work around the error. I wonder if the open method needs some error checking to catch this condition and retry a time or two?

Thanks

dpetzel commented 8 years ago

In case it helps, here is the stack trace:

Class: WinRM::WinRMWSManFault
Message: [WSMAN ERROR CODE: 2147943418]: <f:WSManFault Code='2147943418' Machine='10.42.10.140' xmlns:f='http://schemas.microsoft.com/wbem/wsman/1/wsmanfault'><f:Message><f:ProviderFault path='%systemroot%\system32\winrscmd.dll' provider='Shell cmd plugin'>Illegal operation attempted on a registry key that has been marked for deletion. </f:ProviderFault></f:Message></f:WSManFault>
------Backtrace-------
/var/cache/jenkins/bundler/ruby/2.1.0/gems/winrm-1.3.4/lib/winrm/http/response_handler.rb:64:in `raise_if_wsman_fault'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/winrm-1.3.4/lib/winrm/http/response_handler.rb:51:in `raise_if_error'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/winrm-1.3.4/lib/winrm/http/response_handler.rb:35:in `parse_to_xml'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/winrm-1.3.4/lib/winrm/http/transport.rb:50:in `send_request'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/winrm-1.3.4/lib/winrm/winrm_service.rb:430:in `send_message'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/winrm-1.3.4/lib/winrm/winrm_service.rb:126:in `open_shell'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/winrm-transport-1.0.2/lib/winrm/transport/command_executor.rb:78:in `open'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/winrm-transport-1.0.2/lib/winrm/transport/command_executor.rb:213:in `reset'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/winrm-transport-1.0.2/lib/winrm/transport/command_executor.rb:95:in `run_cmd'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/winrm-transport-1.0.2/lib/winrm/transport/file_transporter.rb:435:in `stream_upload'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/winrm-transport-1.0.2/lib/winrm/transport/file_transporter.rb:461:in `block (2 levels) in stream_upload_file'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/winrm-transport-1.0.2/lib/winrm/transport/file_transporter.rb:460:in `open'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/winrm-transport-1.0.2/lib/winrm/transport/file_transporter.rb:460:in `block in stream_upload_file'
/opt/chefdk/embedded/lib/ruby/2.1.0/benchmark.rb:279:in `measure'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/winrm-transport-1.0.2/lib/winrm/transport/file_transporter.rb:459:in `stream_upload_file'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/winrm-transport-1.0.2/lib/winrm/transport/file_transporter.rb:487:in `block in stream_upload_files'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/winrm-transport-1.0.2/lib/winrm/transport/file_transporter.rb:482:in `each'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/winrm-transport-1.0.2/lib/winrm/transport/file_transporter.rb:482:in `stream_upload_files'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/winrm-transport-1.0.2/lib/winrm/transport/file_transporter.rb:93:in `block in upload'
/opt/chefdk/embedded/lib/ruby/2.1.0/benchmark.rb:279:in `measure'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/winrm-transport-1.0.2/lib/winrm/transport/file_transporter.rb:87:in `upload'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/test-kitchen-1.4.2/lib/kitchen/transport/winrm.rb:118:in `upload'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/test-kitchen-1.4.2/lib/kitchen/provisioner/base.rb:68:in `block in call'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/test-kitchen-1.4.2/lib/kitchen/transport/winrm.rb:468:in `reuse_connection'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/test-kitchen-1.4.2/lib/kitchen/transport/winrm.rb:60:in `connection'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/test-kitchen-1.4.2/lib/kitchen/provisioner/base.rb:64:in `call'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/test-kitchen-1.4.2/lib/kitchen/instance.rb:366:in `block in converge_action'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/test-kitchen-1.4.2/lib/kitchen/instance.rb:488:in `call'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/test-kitchen-1.4.2/lib/kitchen/instance.rb:488:in `synchronize_or_call'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/test-kitchen-1.4.2/lib/kitchen/instance.rb:453:in `block in action'
/opt/chefdk/embedded/lib/ruby/2.1.0/benchmark.rb:279:in `measure'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/test-kitchen-1.4.2/lib/kitchen/instance.rb:452:in `action'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/test-kitchen-1.4.2/lib/kitchen/instance.rb:362:in `converge_action'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/test-kitchen-1.4.2/lib/kitchen/instance.rb:341:in `block in transition_to'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/test-kitchen-1.4.2/lib/kitchen/instance.rb:340:in `each'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/test-kitchen-1.4.2/lib/kitchen/instance.rb:340:in `transition_to'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/test-kitchen-1.4.2/lib/kitchen/instance.rb:160:in `verify'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/test-kitchen-1.4.2/lib/kitchen/instance.rb:189:in `block in test'
/opt/chefdk/embedded/lib/ruby/2.1.0/benchmark.rb:279:in `measure'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/test-kitchen-1.4.2/lib/kitchen/instance.rb:185:in `test'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/test-kitchen-1.4.2/lib/kitchen/command.rb:176:in `public_send'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/test-kitchen-1.4.2/lib/kitchen/command.rb:176:in `block (2 levels) in run_action'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/logging-1.8.2/lib/logging/diagnostic_context.rb:323:in `call'
/var/cache/jenkins/bundler/ruby/2.1.0/gems/logging-1.8.2/lib/logging/diagnostic_context.rb:323:in `block in create_with_logging_context'
kashook commented 8 years ago

I am hitting this exact same error. In my case, I am running test kitchen 1.4.2 using the vagrant driver on my own local system with my own custom Windows Server 2008 R2 VirtualBox basebox. Like dpetzel, I've also seen the issue happen when testing against chef 11 or 12, and this occurs with at least a dozen cookbooks that I have tests for.

In my basebox, I enabled the WinRM analytic and debug logging as described here so that it would be on when the error occurs. I then ran my kitchen tests and told it not to destroy the VM so I could connect and look around.

Each time this error occurs, I see these two messages show up in the application event log in the VM:

Windows detected your registry file is still in use by other applications or services. The file will be unloaded now. The applications or services that hold your registry file may not function properly afterwards.  

 DETAIL - 
 15 user registry handles leaked from \Registry\User\S-1-5-21-3059673099-3564415894-2515895182-1000:
Process 1740 (\Device\HarddiskVolume1\Windows\System32\msiexec.exe) has opened key \REGISTRY\USER\S-1-5-21-3059673099-3564415894-2515895182-1000
Process 1740 (\Device\HarddiskVolume1\Windows\System32\msiexec.exe) has opened key \REGISTRY\USER\S-1-5-21-3059673099-3564415894-2515895182-1000
Process 1740 (\Device\HarddiskVolume1\Windows\System32\msiexec.exe) has opened key \REGISTRY\USER\S-1-5-21-3059673099-3564415894-2515895182-1000
Process 1740 (\Device\HarddiskVolume1\Windows\System32\msiexec.exe) has opened key \REGISTRY\USER\S-1-5-21-3059673099-3564415894-2515895182-1000
Process 1740 (\Device\HarddiskVolume1\Windows\System32\msiexec.exe) has opened key \REGISTRY\USER\S-1-5-21-3059673099-3564415894-2515895182-1000\Software\Microsoft\SystemCertificates\SmartCardRoot
Process 1740 (\Device\HarddiskVolume1\Windows\System32\msiexec.exe) has opened key \REGISTRY\USER\S-1-5-21-3059673099-3564415894-2515895182-1000\Software\Microsoft\SystemCertificates\Root
Process 1740 (\Device\HarddiskVolume1\Windows\System32\msiexec.exe) has opened key \REGISTRY\USER\S-1-5-21-3059673099-3564415894-2515895182-1000\Software\Microsoft\SystemCertificates\trust
Process 1740 (\Device\HarddiskVolume1\Windows\System32\msiexec.exe) has opened key \REGISTRY\USER\S-1-5-21-3059673099-3564415894-2515895182-1000\Software\Microsoft\SystemCertificates\Disallowed
Process 1740 (\Device\HarddiskVolume1\Windows\System32\msiexec.exe) has opened key \REGISTRY\USER\S-1-5-21-3059673099-3564415894-2515895182-1000\Software\Microsoft\SystemCertificates\TrustedPeople
Process 1740 (\Device\HarddiskVolume1\Windows\System32\msiexec.exe) has opened key \REGISTRY\USER\S-1-5-21-3059673099-3564415894-2515895182-1000\Software\Microsoft\SystemCertificates\My
Process 1740 (\Device\HarddiskVolume1\Windows\System32\msiexec.exe) has opened key \REGISTRY\USER\S-1-5-21-3059673099-3564415894-2515895182-1000\Software\Microsoft\SystemCertificates\CA
Process 1740 (\Device\HarddiskVolume1\Windows\System32\msiexec.exe) has opened key \REGISTRY\USER\S-1-5-21-3059673099-3564415894-2515895182-1000\Software\Policies\Microsoft\SystemCertificates
Process 1740 (\Device\HarddiskVolume1\Windows\System32\msiexec.exe) has opened key \REGISTRY\USER\S-1-5-21-3059673099-3564415894-2515895182-1000\Software\Policies\Microsoft\SystemCertificates
Process 1740 (\Device\HarddiskVolume1\Windows\System32\msiexec.exe) has opened key \REGISTRY\USER\S-1-5-21-3059673099-3564415894-2515895182-1000\Software\Policies\Microsoft\SystemCertificates
Process 1740 (\Device\HarddiskVolume1\Windows\System32\msiexec.exe) has opened key \REGISTRY\USER\S-1-5-21-3059673099-3564415894-2515895182-1000\Software\Policies\Microsoft\SystemCertificates

and

Windows detected your registry file is still in use by other applications or services. The file will be unloaded now. The applications or services that hold your registry file may not function properly afterwards.  

 DETAIL - 
 1 user registry handles leaked from \Registry\User\S-1-5-21-3059673099-3564415894-2515895182-1000_Classes:
Process 712 (\Device\HarddiskVolume1\Windows\System32\svchost.exe) has opened key \REGISTRY\USER\S-1-5-21-3059673099-3564415894-2515895182-1000_CLASSES

The msiexec seems to be associated with the install of chef. (That is the only msi that is installed during the test process, at least in my particular case). When I look at the command line in the task manager on the svchost.exe process in the second message, it is this:

C:\Windows\system32\svchost.exe -k RPCSS

At the exact same second, I see this in the WinRM analytic log:

SOAP [listener sending index 1 of 1 total chunks (1446 bytes)] <s:Envelope xml:lang="en-US" xmlns:s="http://www.w3.org/2003/05/soap-envelope" xmlns:a="http://schemas.xmlsoap.org/ws/2004/08/addressing" xmlns:x="http://schemas.xmlsoap.org/ws/2004/09/transfer" xmlns:e="http://schemas.xmlsoap.org/ws/2004/08/eventing" xmlns:n="http://schemas.xmlsoap.org/ws/2004/09/enumeration" xmlns:w="http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd" xmlns:p="http://schemas.microsoft.com/wbem/wsman/1/wsman.xsd"><s:Header><a:Action>http://schemas.dmtf.org/wbem/wsman/1/wsman/fault</a:Action><a:MessageID>uuid:7435DD5C-BC72-4AB7-9878-2AAF2761ABC0</a:MessageID><p:ActivityId>016C1C48-F800-0001-FEE1-8CDE5D38D101</p:ActivityId><a:To>http://schemas.xmlsoap.org/ws/2004/08/addressing/role/anonymous</a:To><a:RelatesTo>uuid:6081BD99-314E-4C57-AAD5-B97D97BD4E5A</a:RelatesTo></s:Header><s:Body><s:Fault><s:Code><s:Value>s:Receiver</s:Value><s:Subcode><s:Value>w:InternalError</s:Value></s:Subcode></s:Code><s:Reason><s:Text xml:lang="en-US">Illegal operation attempted on a registry key that has been marked for deletion. </s:Text></s:Reason><s:Detail><f:WSManFault xmlns:f="http://schemas.microsoft.com/wbem/wsman/1/wsmanfault" Code="2147943418" Machine="127.0.0.1"><f:Message><f:ProviderFault provider="Shell cmd plugin" path="%systemroot%\system32\winrscmd.dll">Illegal operation attempted on a registry key that has been marked for deletion. </f:ProviderFault></f:Message></f:WSManFault></s:Detail></s:Fault></s:Body></s:Envelope>

This error always occurs right after this message in the kitchen debug log:

[CommandExecutor] Resetting WinRM shell (Max command limit is 13)

(I've omitted any more of the log and a stack trace since it looks just like the ones posted by dpetzel. In my case, the winrm-transport version is 1.0.3).

I agree with dpetzel that it seems like a timing issue. I think closing the old winrm shell is causing the user account to get logged off, and the registry cleanup stuff isn't quite finished yet when the attempt is made to open the new shell. I hacked in some quick retry logic around the the service.open_shell call in the open method in my local copy of the gem, and it seems to do the trick.

I'm thinking of submitting a pull request, but I'm sort of hoping for some guidance on something from those who would have the power to decide whether to accept my pull request.

The exact exception occuring is a WinRMWSManFault exception. This has properties for the fault_code and fault_description. Those look like this when this error occurs:

fault_code = 2147943418
fault_description = <f:WSManFault Code='2147943418' Machine='localhost' xmlns:f='http://schemas.microsoft.com/wbem/wsman/1/wsmanfault'><f:Message><f:ProviderFault path='%systemroot%\system32\winrscmd.dll' provider='Shell cmd plugin'>Illegal operation attempted on a registry key that has been marked for deletion. </f:ProviderFault></f:Message></f:WSManFault>

I'm not positive that the fault code is unique to this exact error, so I was thinking of checking for the "Illegal operation attempted on a registry key that has been marked for deletion" text instead, and only retrying when that specific error occurs. Also, I'm not sure how many times to retry. I was thinking a handful of times with a second of sleep between each attempt. @fnichol or others, any thoughts?

jonnyyu commented 8 years ago

Hi, I'm seeing the exact same error with Windows 2012r2 using vagrant 1.8. really looking forward to merge this PR.

Thanks!

AntoineBro commented 8 years ago

:+1 I'm seeing the same error on windows server 2008 R2 SP1