puppetlabs / puppet-agent

All of the directions for building a puppet agent package.
Other
47 stars 146 forks source link

puppet agent works interactively but refuses to start service #2459

Closed rismoney closed 3 weeks ago

rismoney commented 6 months ago

After upgrading, either by running an 8.x install either by first uninstalling a 7.x install or by running it over the top, puppet installs successfully on windows 2019/2022 however I receive errors on the service. Seems to happen moreso on core, but it occurs on both desktop experience and regular. Sometimes after 5-10 starts it will actually start, but usually if stopped, it refuses to start again.

Failed to transition the puppet service to the SERVICE_RUNNING state. Detail: Failed to start the service: The service did not respond to the start or control request in a timely fashion.

Running puppet agent interactively works perfectly fine. rolling back to 7.x (even 7.27.0) works, as does starting the 7.x service. The issue is exclusively on the service starting cleanly after installation. A reboot does not fix it. Nor does removing C:\programdata\puppetlabs, or refreshing to certs.

Any other additional information needed, let me know.

joshcooper commented 6 months ago

I haven't been able to reproduce using 8.3.1. I've started & stopped the puppet service many times using the service control manager & net start, but no failures.

Is it possible the service is loading a different puppet.conf than is used when running puppet agent -t?

Are you running the service as LocalSystem or domain user account?

Might also try checking that Windows Defender or other other AV isn't blocking the service?

rismoney commented 6 months ago

I am only able to reproduce. I have eliminated defender. I have tried both localsystem and domain user. I dont know how it would reference a different puppet.conf. It is the only one in the ProgramData directory. I have tried removing from domain, eliminating gpo.

The node I have been testing the most on is Server Core, but I see it on Win2019 desktop experience.

Most of the nodes do not have internet, but that doesn't seem to matter either as I see it on internet accessible nodes as well (no proxy). I am going to try a clean ISO install of windoes next. But it is frustrating that I cannot get a real error messge, as it seems to fail before the service launches.

rismoney commented 6 months ago

This appears to be an issue with our NTP software and not the puppet agent. I am not entirely sure why, but upgrading the NTP software to the latest version seems to have remediated. If this is still a problem, I will re-open. Thank you for your time!

joshcooper commented 6 months ago

Thanks for letting us know!

rismoney commented 6 months ago

I think there is an issue between puppet and the ntp software, but I am not sure what it is. It seems as though something in ruby 3.2 ffi is querying services and getting hung up on this service. I am not sure why or what or what changed in puppet 8.x that is causing this condition. It definitely doesn't present itself in version 7.x or if I stop the ntp client. I think something puppet is doing (not every restart-service) is causing it to fail.

I am running the client in here- https://www.greyware.com/download/shareware.asp

rismoney commented 6 months ago

Alright - the difference between the ntp software, and all the other services is that this particular service name contains spaces.

I believe there is a bug maybe in the Puppet::Util::Services, whereby if the service name, (effectively the registry key in HKLM:/CurrentControlSet/Services/) has a space in it, then the daemon is failing in to initiate.

This was easily reproducable using the software above, and hacking the regkey to not have a space in it (renamed it from "Domain Time Client" to DomainTimeClient, and rebooting. This is a guess, but I believe, potentially any service that comes before puppet alphabetically with a space in it, might cause issue.

My understanding is that this is indeed a valid service name, but puppet isn't handling it properly. I could recommend the vendor Greyware, modify their service name, but I think it should be fixed on puppet's side.

joshcooper commented 6 months ago

I don't think I've ever seen a service with a space in the service name before (not to be confused with display name). I'm pretty sure that's asking for trouble, but the docs don't say you can't:

https://learn.microsoft.com/en-us/windows/win32/api/winsvc/nf-winsvc-createservicew

[in] lpServiceName

The name of the service to install. The maximum string length is 256 characters. The service control manager database preserves the case of the characters, but service name comparisons are always case insensitive. Forward-slash (/) and backslash () are not valid service name characters.

That said, puppet does not have trouble resolving the service:

C:\>irb
irb(main):001:0> require 'puppet'
=> true
irb(main):002:0> Puppet.initialize_settings
=> {}
...
irb(main):006:0> Puppet::Util::Windows::Service.services.select {|k,v| k.match? 'Domain Time Client'}
=> {"Domain Time Client"=>{:display_name=>"Domain Time Client", :service_status_process=>#<Puppet::FFI::Windows::Structs::SERVICE_STATUS_PROCESS:0x000000000b8a0998>}}
irb(main):007:0> Puppet::Util::Windows::Service.service_start_type('Domain Time Client')
=> :SERVICE_AUTO_START
...
irb(main):011:0> Puppet::Util::Windows::Service.logon_account('Domain Time Client')
=> "LocalSystem"

Is it possible the service is protected?

rismoney commented 6 months ago

I went down the exact same path actually today. I totally agree with everything you wrote. I agree with you it's asking for trouble, however the domain time vendor reported back, that they cannot make the change as it would be breaking their compatibility promises. I have seen one other service from OneDrive Updater with a space in it. So it is definitely an edge case but not illegal as the win32 api states.

I too concurred that querying the service (all services for that matter) via irb through the Puppet::Util::Windows::Service enumerates properly.

Based on the procmon, enumeration of all services is definitely happening when this particular service is stopped, but not when it is running.

To my knowledge, the service is not protected. I thought that protected services are used moreso for drivers with a sys file that has been digitially signed. I believe this to be a 'simple' exe. I wasn't able to follow along with all the daemon code, but is it possible the values are somehow getting munged between the FFI stuff, and the service string with the space and causing an exception killing the daemon? As mentioned this behavior all started in Puppet8+ and does not present on any version before 7 or before.

The weirdest part is that "sometimes" it will actually start, but I think its really a bad start, because after a runinterval cycles then the agent will effectively die.

rismoney commented 5 months ago

So is this something that can be fixed in the daemon or do we need more information? Were you able to reproduce? I can provide any info needed, ie perfmon captures.

Thank you for all your help in this matter.

joshcooper commented 5 months ago

I installed the service you mentioned and couldn't reproduce the failure.

rismoney commented 5 months ago

Which OS?

rismoney commented 5 months ago

I think the daemon might not be using Puppet::Util::Windows::Service. I think its doing something different using FFI. I still can't get to the bottom of this.

github-actions[bot] commented 2 months ago

Migrated issue to PA-6394

joshcooper commented 2 months ago

This should hopefully be fixed by https://github.com/puppetlabs/puppet/pull/9338

joshcooper commented 3 weeks ago

This was fixed in https://github.com/puppetlabs/puppet/pull/9386 and backported to 7.x in https://github.com/puppetlabs/puppet/pull/9389