Closed elfranne closed 1 year ago
Hi @elfranne, I've attempted to reproduce your issue but was unable to do so. None of the 4 packages cause a system restart to be needed:
# puppet agent -t
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for maas-pool-06.metal.dreamworx.nl
Notice: Package[libnetplan0] (unmanaged) will be updated by Patching_as_code
Notice: Package[netplan.io] (unmanaged) will be updated by Patching_as_code
Notice: Package[ubuntu-advantage-tools] (unmanaged) will be updated by Patching_as_code
Notice: Package[grub-common] (unmanaged) will be updated by Patching_as_code
Info: Applying configuration version 'pe-production-2710b9498c1'
Notice: /Stage[main]/Patching_as_code::Linux::Patchday/Exec[Patching as Code - Clean Cache]/returns: executed successfully (corrective)
Notice: /Stage[main]/Patching_as_code::Linux::Patchday/Package[libnetplan0]/ensure: ensure changed '0.99-0ubuntu1' to '0.104-0ubuntu2~20.04.1'
Notice: /Stage[main]/Patching_as_code::Linux::Patchday/Package[netplan.io]/ensure: ensure changed '0.99-0ubuntu1' to '0.104-0ubuntu2~20.04.1'
Notice: /Stage[main]/Patching_as_code::Linux::Patchday/Package[ubuntu-advantage-tools]/ensure: ensure changed '20.3' to '27.7~20.04.1'
Notice: /Stage[main]/Patching_as_code::Linux::Patchday/Package[grub-common]/ensure: ensure changed '2.04-1ubuntu26.12' to '2.04-1ubuntu26.15' (corrective)
Notice: /Stage[main]/Patching_as_code/File[Patching as Code - Save Patch Run Info]/content: content changed '{sha256}a44d226548710a1901483982b1775c58578f595256231efcbb1fb048f7a547a7' to '{sha256}a70947e69922b0f82d78cbe31862943cc03597bcad6dca222925542b936a2231'
Notice: Patches installed, refreshing patching facts...
Notice: /Stage[main]/Patching_as_code/Notify[Patching as Code - Update Fact]/message: defined 'message' as 'Patches installed, refreshing patching facts...'
Info: /Stage[main]/Patching_as_code/Notify[Patching as Code - Update Fact]: Scheduling refresh of Exec[pe_patch::exec::fact]
Info: /Stage[main]/Patching_as_code/Notify[Patching as Code - Update Fact]: Scheduling refresh of Exec[pe_patch::exec::fact_upload]
Notice: /Stage[main]/Pe_patch/Exec[pe_patch::exec::fact_upload]: Triggered 'refresh' from 1 event
Notice: /Stage[main]/Pe_patch/Exec[pe_patch::exec::fact]: Triggered 'refresh' from 1 event
Notice: Applied catalog in 34.81 seconds
root@maas-pool-06:~# cat /var/run/reboot-required
cat: /var/run/reboot-required: No such file or directory
root@maas-pool-06:~# /bin/sh /opt/puppetlabs/puppet/cache/lib/patching_as_code/pending_reboot.sh
root@maas-pool-06:~#
As a result the reboot: ifneeded
parameter in the schedule detected no pending restart and thus did not restart your system at the end of the patch run. I can only assume that some other change was made to the system at a later point in time, that caused a reboot to be pending when you checked the reboot pending status.
If I mock a pending reboot by running cat *** System restart required *** > /var/run/reboot-required
during the Puppet Agent run, I can see this properly triggers the reboot:
# puppet agent -t
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for maas-pool-06.metal.dreamworx.nl
Notice: Package[ubuntu-advantage-tools] (unmanaged) will be updated by Patching_as_code
Info: Applying configuration version 'pe-production-2710b9498c1'
Notice: /Stage[main]/Patching_as_code::Linux::Patchday/Exec[Patching as Code - Clean Cache]/returns: executed successfully (corrective)
Notice: /Stage[main]/Patching_as_code::Linux::Patchday/Package[ubuntu-advantage-tools]/ensure: ensure changed '20.3' to '27.7~20.04.1' (corrective)
Notice: /Stage[main]/Patching_as_code/File[Patching as Code - Save Patch Run Info]/content: content changed '{sha256}a70947e69922b0f82d78cbe31862943cc03597bcad6dca222925542b936a2231' to '{sha256}8a9c4d62a464bd117dc88e0aea74b1545265d964dcbaa75dff7e341b697c641a'
Notice: Patches installed, refreshing patching facts...
Notice: /Stage[main]/Patching_as_code/Notify[Patching as Code - Update Fact]/message: defined 'message' as 'Patches installed, refreshing patching facts...'
Info: /Stage[main]/Patching_as_code/Notify[Patching as Code - Update Fact]: Scheduling refresh of Exec[pe_patch::exec::fact]
Info: /Stage[main]/Patching_as_code/Notify[Patching as Code - Update Fact]: Scheduling refresh of Exec[pe_patch::exec::fact_upload]
Notice: /Stage[main]/Pe_patch/Exec[pe_patch::exec::fact_upload]: Triggered 'refresh' from 1 event
Notice: /Stage[main]/Pe_patch/Exec[pe_patch::exec::fact]: Triggered 'refresh' from 1 event
Broadcast message from root@maas-pool-06 (Sat 2022-04-16 16:28:29 UTC):
The system is going down for reboot at Sat 2022-04-16 16:30:29 UTC!
Notice: /Stage[patch_reboot]/Patching_as_code::Reboot/Exec[Patching as Code - Patch Reboot]/returns: Shutdown scheduled for Sat 2022-04-16 16:30:29 UTC, use 'shutdown -c' to cancel.
Notice: /Stage[patch_reboot]/Patching_as_code::Reboot/Exec[Patching as Code - Patch Reboot]/returns: executed successfully (corrective)
Notice: Applied catalog in 22.71 seconds
So I don't believe that there is a problem with the module. Please verify and let me know if there's anything we can do.
So the /var/run/reboot-required
needs to be created during the Puppet run ? If I installed a new kernel that requires a reboot before using the Puppet modules it will not reboot ?
/var/run/reboot-required
normally gets created automatically when e.g. a new kernel is installed that requires a reboot. Puppet will detect that the file is there and perform the reboot if you use reboot: ifneeded
.
If you perform an action that requires a reboot (and thus causes the /var/run/reboot-required
file to be created) before you perform a patch run with Puppet, the patch run will detect the pending reboot and reboot the machine before performing the patching cycle, to clear out the pending reboot first.
The /var/run/reboot-required
was present before the puppet run (see file date on my initial report) and we use the reboot: ifneeded
but the reboot is not performed.
Can you share the value of the patching_as_code_config
fact for the node experiencing this problem please.
facter patching_as_code_config
does not report anything. I can't see where that fact is being added, there is a Windows only fact here.
There is a config file /opt/puppetlabs/facter/facts.d/patching_configuration.json
:
{
"patching_as_code_config": {
"allowlist": [
],
"blocklist": [
],
"high_priority_list": [
],
"allowlist_choco": [
],
"blocklist_choco": [
],
"high_priority_list_choco": [
],
"enable_patching": true,
"patch_fact": "os_patching",
"patch_group": [
"2Tuesday"
],
"patch_schedule": {
"2Tuesday": {
"day_of_week": "Tuesday",
"count_of_week": 2,
"hours": "22:00 - 23:30",
"max_runs": 3,
"reboot": "ifneeded"
}
},
"high_priority_patch_group": "never",
"post_patch_commands": {
},
"pre_patch_commands": {
},
"pre_reboot_commands": {
},
"patch_on_metered_links": false,
"security_only": false,
"patch_choco": false,
"unsafe_process_list": [
]
}
}
facter -p patching_as_code_config
will also report fact this successfully, as you need to have the Puppet Agent generate this fact (it's not part of Facter's built-in OS facts).
Regardless, the above is the info I need. I can see you're using albatrossflavour/os_patching
; I test this module with puppetlabs/pe_patch
(the variant that comes built-in with PE), but it shouldn't make any difference for the reboot behavior as that's all internal to the patching_as_code
module.
You say that /var/run/reboot-required
already existed before the Puppet run started. This should actually cause the patch run to detect a pending reboot and immediately restart the node at the beginning of the patch run, even before applying any patches. Clearly that's also not happening in your case.
Can you share how you invoke patching_as_code? What does the manifest look like that calls the patching_as_code
class? I'm wondering if you're possibly setting the high_priority_only
parameter to true
.
This is all the code we use to call patching_as_code
:
in common.yaml (extract)
patching_as_code::patch_schedule:
2Tuesday:
day_of_week: Tuesday
count_of_week: 2
hours: 22:00 - 23:30
max_runs: 3
reboot: ifneeded
the relevant node hiera contains:
profile::base_linux::patch_schedule: '2Tuesday'
and the class is called:
class profile::base_linux (
String $patch_schedule = 'never',
) {
class { 'patching_as_code':
patch_group => $patch_schedule,
}
}
@elfranne ok that looks good. One other thing to verify: the reboot detection logic uses the puppet_vardir
fact:
/bin/sh ${facts['puppet_vardir']}/lib/patching_as_code/pending_reboot.sh | grep true
This fact is part of the puppetlabs/stdlib module. Can you confirm this module is installed?
yes we use that module, it's deployed with r10k.
Can you run facter -p puppet_vardir
on the node and provide me the output please?
$ sudo facter -p puppet_vardir
/opt/puppetlabs/puppet/cache
ok, all of that looks good. Unfortunately nothing is pointing to something that could be the culprit. Can you perform a trace run during the patch window? Hopefully that will shed more light on the matter...
puppet agent -t --evaltrace
Near the end of the run, the patch_reboot
stage will get processed, which is responsible for detecting pending reboots. No pending reboots detected looks like this
Info: Stage[patch_reboot]: Starting to evaluate the resource (245 of 256)
Info: Stage[patch_reboot]: Evaluated in 0.00 seconds
Info: Class[Patching_as_code::Reboot]: Starting to evaluate the resource (246 of 256)
Info: Class[Patching_as_code::Reboot]: Evaluated in 0.00 seconds
Info: /Stage[patch_reboot]/Patching_as_code::Reboot/Exec[Patching as Code - Patch Reboot]: Starting to evaluate the resource (247 of 256)
Info: /Stage[patch_reboot]/Patching_as_code::Reboot/Exec[Patching as Code - Patch Reboot]: Evaluated in 0.88 seconds
Info: Class[Patching_as_code::Reboot]: Starting to evaluate the resource (248 of 256)
Info: Class[Patching_as_code::Reboot]: Evaluated in 0.00 seconds
Info: Stage[patch_reboot]: Starting to evaluate the resource (249 of 256)
Info: Stage[patch_reboot]: Evaluated in 0.00 seconds
A detected pending reboot looks like this
Info: Stage[patch_reboot]: Starting to evaluate the resource (340 of 351)
Info: Stage[patch_reboot]: Evaluated in 0.00 seconds
Info: Class[Patching_as_code::Reboot]: Starting to evaluate the resource (341 of 351)
Info: Class[Patching_as_code::Reboot]: Evaluated in 0.00 seconds
Info: /Stage[patch_reboot]/Patching_as_code::Reboot/Exec[Patching as Code - Patch Reboot]: Starting to evaluate the resource (342 of 351)
Notice: /Stage[patch_reboot]/Patching_as_code::Reboot/Exec[Patching as Code - Patch Reboot]/returns: Shutdown scheduled for Wed 2022-05-11 15:28:13 CEST, use 'shutdown -c' to cancel.
Notice: /Stage[patch_reboot]/Patching_as_code::Reboot/Exec[Patching as Code - Patch Reboot]/returns: executed successfully
Info: /Stage[patch_reboot]/Patching_as_code::Reboot/Exec[Patching as Code - Patch Reboot]: Evaluated in 0.82 seconds
Info: Class[Patching_as_code::Reboot]: Starting to evaluate the resource (343 of 351)
Info: Class[Patching_as_code::Reboot]: Evaluated in 0.00 seconds
You can optionally add the --debug
switch to get even more info, especially if you suspect the detection command is somehow not getting executed correctly:
Info: Stage[patch_reboot]: Starting to evaluate the resource (340 of 351)
Info: Stage[patch_reboot]: Evaluated in 0.00 seconds
Info: Class[Patching_as_code::Reboot]: Starting to evaluate the resource (341 of 351)
Info: Class[Patching_as_code::Reboot]: Evaluated in 0.00 seconds
Info: /Stage[patch_reboot]/Patching_as_code::Reboot/Exec[Patching as Code - Patch Reboot]: Starting to evaluate the resource (342 of 351)
Debug: Exec[Patching as Code - Patch Reboot](provider=posix): Executing check '/bin/sh /opt/puppetlabs/puppet/cache/lib/patching_as_code/pending_reboot.sh | grep true'
Debug: Executing: '/bin/sh /opt/puppetlabs/puppet/cache/lib/patching_as_code/pending_reboot.sh | grep true'
Debug: /Stage[patch_reboot]/Patching_as_code::Reboot/Exec[Patching as Code - Patch Reboot]/onlyif: true
Debug: Exec[Patching as Code - Patch Reboot](provider=posix): Executing '/sbin/shutdown -r +2'
Debug: Executing: '/sbin/shutdown -r +2'
Notice: /Stage[patch_reboot]/Patching_as_code::Reboot/Exec[Patching as Code - Patch Reboot]/returns: Shutdown scheduled for Wed 2022-05-11 15:33:37 CEST, use 'shutdown -c' to cancel.
Notice: /Stage[patch_reboot]/Patching_as_code::Reboot/Exec[Patching as Code - Patch Reboot]/returns: executed successfully
Debug: /Stage[patch_reboot]/Patching_as_code::Reboot/Exec[Patching as Code - Patch Reboot]: The container Class[Patching_as_code::Reboot] will propagate my refresh event
Info: /Stage[patch_reboot]/Patching_as_code::Reboot/Exec[Patching as Code - Patch Reboot]: Evaluated in 0.82 seconds
Info: Class[Patching_as_code::Reboot]: Starting to evaluate the resource (343 of 351)
Debug: Class[Patching_as_code::Reboot]: The container Stage[patch_reboot] will propagate my refresh event
Info: Class[Patching_as_code::Reboot]: Evaluated in 0.00 seconds
Info: Stage[patch_reboot]: Starting to evaluate the resource (344 of 351)
Info: Stage[patch_reboot]: Evaluated in 0.00 seconds
Sorry for the delay @kreeuwijk. I ran Puppet on one of the machine showing those symptoms:
$ jq .patching_as_code_config.patch_schedule /opt/puppetlabs/facter/facts.d/patching_configuration.json
{
"debug": {
"day_of_week": "Any",
"count_of_week": [
1,
2,
3,
4
],
"hours": "08:00 - 18:00",
"max_runs": 3,
"reboot": "ifneeded"
}
}
$ date
2022-05-31T08:04:06 UTC
$ /bin/sh /opt/puppetlabs/puppet/cache/lib/patching_as_code/pending_reboot.sh
true
Info: Class[Patching_as_code]: Starting to evaluate the resource (287 of 1047)
Info: Class[Patching_as_code]: Evaluated in 0.00 seconds
Info: /Stage[main]/Patching_as_code/File[/etc/puppetlabs/puppet/patching_unsafe_processes]: Starting to evaluate the resource (288 of 1047)
Info: /Stage[main]/Patching_as_code/File[/etc/puppetlabs/puppet/patching_unsafe_processes]: Evaluated in 0.00 seconds
Info: Class[Os_patching]: Starting to evaluate the resource (289 of 1047)
Info: Class[Os_patching]: Evaluated in 0.00 seconds
Info: /Stage[main]/Os_patching/File[/var/cache/os_patching]: Starting to evaluate the resource (290 of 1047)
Info: /Stage[main]/Os_patching/File[/var/cache/os_patching]: Evaluated in 0.00 seconds
Info: /Stage[main]/Os_patching/File[/usr/local/bin/os_patching_fact_generation.sh]: Starting to evaluate the resource (291 of 1047)
Debug: HTTP GET https://puppet:8140/puppet/v3/file_metadata/modules/os_patching/os_patching_fact_generation.sh?links=manage&checksum_type=md5&source_permissions=ignore&environment=debugpatchingascode returned 200 OK
Info: /Stage[main]/Os_patching/File[/usr/local/bin/os_patching_fact_generation.sh]: Evaluated in 0.08 seconds
Info: /Stage[main]/Os_patching/File[/var/cache/os_patching/patch_window]: Starting to evaluate the resource (292 of 1047)
Info: /Stage[main]/Os_patching/File[/var/cache/os_patching/patch_window]: Evaluated in 0.00 seconds
Info: /Stage[main]/Os_patching/File[/var/cache/os_patching/pre_patching_command]: Starting to evaluate the resource (293 of 1047)
Debug: /Stage[main]/Os_patching/File[/var/cache/os_patching/pre_patching_command]: Nothing to manage: no ensure and the resource doesn't exist
Info: /Stage[main]/Os_patching/File[/var/cache/os_patching/pre_patching_command]: Evaluated in 0.00 seconds
Info: /Stage[main]/Os_patching/File[/var/cache/os_patching/block_patching_on_warnings]: Starting to evaluate the resource (294 of 1047)
Debug: /Stage[main]/Os_patching/File[/var/cache/os_patching/block_patching_on_warnings]: Nothing to manage: no ensure and the resource doesn't exist
Info: /Stage[main]/Os_patching/File[/var/cache/os_patching/block_patching_on_warnings]: Evaluated in 0.00 seconds
Info: /Stage[main]/Os_patching/File[/var/cache/os_patching/reboot_override]: Starting to evaluate the resource (295 of 1047)
Info: /Stage[main]/Os_patching/File[/var/cache/os_patching/reboot_override]: Evaluated in 0.00 seconds
Info: /Stage[main]/Os_patching/File[/var/cache/os_patching/blackout_windows]: Starting to evaluate the resource (296 of 1047)
Debug: /Stage[main]/Os_patching/File[/var/cache/os_patching/blackout_windows]: Nothing to manage: no ensure and the resource doesn't exist
Info: /Stage[main]/Os_patching/File[/var/cache/os_patching/blackout_windows]: Evaluated in 0.00 seconds
Info: /Stage[main]/Os_patching/Exec[os_patching::exec::fact_upload]: Starting to evaluate the resource (297 of 1047)
Debug: /Stage[main]/Os_patching/Exec[os_patching::exec::fact_upload]: ''/opt/puppetlabs/bin/puppet' facts upload' won't be executed because of failed check 'refreshonly'
Info: /Stage[main]/Os_patching/Exec[os_patching::exec::fact_upload]: Evaluated in 0.00 seconds
Info: /Stage[main]/Os_patching/Exec[os_patching::exec::fact]: Starting to evaluate the resource (298 of 1047)
Debug: /Stage[main]/Os_patching/Exec[os_patching::exec::fact]: '/usr/local/bin/os_patching_fact_generation.sh' won't be executed because of failed check 'refreshonly'
Info: /Stage[main]/Os_patching/Exec[os_patching::exec::fact]: Evaluated in 0.00 seconds
Info: /Stage[main]/Os_patching/Cron[Cache patching data]: Starting to evaluate the resource (299 of 1047)
Info: /Stage[main]/Os_patching/Cron[Cache patching data]: Evaluated in 0.01 seconds
Info: /Stage[main]/Os_patching/Cron[Cache patching data at reboot]: Starting to evaluate the resource (300 of 1047)
Info: /Stage[main]/Os_patching/Cron[Cache patching data at reboot]: Evaluated in 0.00 seconds
Info: /Stage[main]/Os_patching/Cron[Run apt autoremove on reboot]: Starting to evaluate the resource (301 of 1047)
Debug: /Stage[main]/Os_patching/Cron[Run apt autoremove on reboot]: Nothing to manage: no ensure and the resource doesn't exist
Info: /Stage[main]/Os_patching/Cron[Run apt autoremove on reboot]: Evaluated in 0.00 seconds
Info: Class[Os_patching]: Starting to evaluate the resource (302 of 1047)
Info: Class[Os_patching]: Evaluated in 0.00 seconds
Info: /Stage[main]/Main/Schedule[Patching as Code - High Priority Patch Window]: Starting to evaluate the resource (303 of 1047)
Info: /Stage[main]/Main/Schedule[Patching as Code - High Priority Patch Window]: Evaluated in 0.00 seconds
Info: /Stage[main]/Patching_as_code/File[patching_configuration.json]: Starting to evaluate the resource (304 of 1047)
Info: /Stage[main]/Patching_as_code/File[patching_configuration.json]: Evaluated in 0.00 seconds
Info: Class[Patching_as_code]: Starting to evaluate the resource (305 of 1047)
Info: Class[Patching_as_code]: Evaluated in 0.00 seconds
Debug: HTTP GET https://puppet:8140/puppet/v3/file_metadatas/modules/elastic_beats/usr/share/filebeat/module?recurse=true&max_files=0&links=manage&checksum_type=md5&source_permissions=ignore&environment=debugpatchingascode returned 200 OK
Info: Stage[patch_reboot]: Starting to evaluate the resource (1059 of 1067)
Info: Stage[patch_reboot]: Evaluated in 0.00 seconds
Info: Stage[patch_reboot]: Starting to evaluate the resource (1060 of 1067)
Info: Stage[patch_reboot]: Evaluated in 0.00 seconds
So it seems the pending_reboot.sh
never get called.
Hi @elfranne, that is expected as May 31st is in week 5 of the month and your patch schedule only allows week 1-4.
I've encountered the same problem, the reboot is never triggered after an update on linux (tested on centos and debian), I believe this is because the exec provider is posix and not shell (https://github.com/puppetlabs/puppetlabs-patching_as_code/blob/main/manifests/reboot.pp#L18 ), since the onlyif command has a pipe I don't think it could work with provider=posix. One otherway or fixing it would be to change the pending_reboot.sh to have different return code depending of the status instead of echoing "true".
@gmenuel which version of CentOS and Debian did you test on? I'll take a look at moving to a return code, could you test this when I make a test version available?
I've testing on CentOS 7 and Debian 11. Yeah I can test without problem :)
@gmenuel can you try the reboot-fix
branch of the module? This has the requested changes and passes tests on my system.
mod 'puppetlabs/patching_as_code',
git: 'https://github.com/puppetlabs/puppetlabs-patching_as_code.git',
ref: 'reboot-fix'
Thanks for the quick fix, I've tested on debian, and still no success :
Reboot/Exec[Patching as Code - Patch Reboot]) Could not evaluate: /opt/puppetlabs/puppet/cache/lib/patching_as_code/pending_reboot.sh: 24: facter: not found
Reboot/Exec[Patching as Code - Patch Reboot]) /opt/puppetlabs/puppet/cache/lib/patching_as_code/pending_reboot.sh: 24: [: =: unexpected operator
Reboot/Exec[Patching as Code - Patch Reboot]) /opt/puppetlabs/puppet/cache/lib/patching_as_code/pending_reboot.sh: 24: facter: not found
Reboot/Exec[Patching as Code - Patch Reboot]) /opt/puppetlabs/puppet/cache/lib/patching_as_code/pending_reboot.sh: 24: [: =: unexpected operator
Reboot/Exec[Patching as Code - Patch Reboot]) /opt/puppetlabs/puppet/cache/lib/patching_as_code/pending_reboot.sh: 33: [[: not found
It seems that we still have a few bugs:
/bin/sh -l
to similate a login and thus retrieve the PATH from /etc/profile.d, though it might maybe cause other problems, another solution would be to append /opt/puppetlabs/bin
to the PATH variable in the script@gmenuel did it work on CentOS?
So it seems that facter
not being in the path can be a cause the issue you're seeing. I'll take a look at how to best find it then.
@gmenuel I've updated the code, can you try again?
Thanks, I've tested it on Debian 11 and CentOS 7 and it works !
Fixed in v1.1.7 of the module, now available on the Forge
Describe the Bug
Reboot is not triggered.
Expected Behavior
Reboot
Environment
Using default config apart from new schedule:
Patching_as_code ran and updated OS packages:
Other relevant info: