voxpupuli / puppet-kmod

manage kernel module with puppet
Apache License 2.0
17 stars 63 forks source link

kmods fact produces null byte in parameters datastructure #98

Closed phihos closed 1 year ago

phihos commented 1 year ago

Affected Puppet, Ruby, OS and module versions/distributions

How to reproduce (e.g Puppet code you use)

In my case Puppet ran in a libvirt VM where the parameter "virtio_mmio/device" has apparently no value. Running puppet facts produced:

[...]
      "virtio_mmio": {
        "parameters": {
          "device": "\u0000"
        },
        "used_by": [

        ]
      },
[...]

This produced the following error when uploading the fact to Foreman:

'Action failed' error (ArgumentError): string contains null byte

I know Foreman could handle such input more gracefully, but \u0000 does not look like a correct value either.

What are you seeing

See above.

What behaviour did you expect instead

An empty string like this:

[...]
      "virtio_mmio": {
        "parameters": {
          "device": ""
        },
        "used_by": [

        ]
      },
[...]

Output log

See above.

Any additional information you'd like to impart

saz commented 1 year ago

I'm seeing a different error message since yesterday:

[2023-08-23 22:36:26.420436 ] ERROR Facter - Fact resolution fact='kmods', resolution='<anonymous>' resolved to an invalid value: String "\u0000" contains a null byte reference; unsupported for all facts.

It's also virtio_mmio in my case.

Puppet: 7.26.0 Ruby: 2.7.8p225 Distribution: OSS Module version: 3.2.0

saz commented 1 year ago

Changing the line

kmod[directory]['parameters'][param] = File.read("/sys/module/#{directory}/parameters/#{param}").chomp

to

kmod[directory]['parameters'][param] = File.read("/sys/module/#{directory}/parameters/#{param}").chomp.delete("\u0000")

fixes the issue, but I'm not sure, if that's the best way.

elfranne commented 1 year ago

The new error seems to be caused by new version of facter: works fine with facter 4.4.1 but not 4.4.2. The new version is included in Puppet agent 7.26

waipeng commented 1 year ago

@elfranne thanks for finding this. This is very helpful, as Puppet release notes does not say about the Facter bump, nor Facter release notes have a note about this behaviour.

waipeng commented 1 year ago

@saz can confirm that patch is good. You should send a PR.

smortex commented 1 year ago

The new error seems to be caused by new version of facter: works fine with facter 4.4.1 but not 4.4.2. The new version is included in Puppet agent 7.26

It looks like the check was added in https://github.com/puppetlabs/facter/pull/2590.

So basically, if the problem was already there, it was not reported before.

Looking at https://elixir.bootlin.com/linux/latest/source/drivers/virtio/virtio_mmio.c#L784 it seems that this is supposed to be an empty string or multiple lines of text. What is your kernel version?

waipeng commented 1 year ago

maybe I can answer

# cat /etc/issue
Ubuntu 20.04.6 LTS \n \l
# uname -a
Linux monitoring-melbourne-qh2 5.4.0-156-generic #173-Ubuntu SMP Tue Jul 11 07:25:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
irb(main):001:0> File.read('/sys/module/virtio_mmio/parameters/device')
=> "\u0000"
saz commented 1 year ago

@smortex this issue has been reported two months ago, maybe the error message was different due to the usage of Foreman?

I'm seeing this error message on Ubuntu 20.04 and Ubuntu jammy. As the kernel version is still the same, as two days ago, it seems unrelated.

I'll check, when facter has been upgraded on those affected systems.

saz commented 1 year ago

The Debian package release for Ubuntu Focal and Jammy happened on 2023-08-23. This perfectly matches the first time I've seen this issue. As @waipeng pointed out, there's no information about that change in any of the release notes.

@smortex I don't understand why, but the error isn't shown on Debian Bookworm with kernel 6.1. Looks like it's only happening for me on Ubuntu systems with kernel 5.x If you don't see a better way to fix this, I'll send in a PR, as this issue is quite annoying :smiley:

smortex commented 1 year ago

All Debian nodes I have access to have no /sys/module/virtio_mmio/parameters/device (Debian 10.13, 11.7, 12.1). All Ubuntu nodes I have access to have a /sys/module/virtio_mmio/parameters/device with the unexpected content \0 (Ubuntu 18.04, 20.04. 22.04).

All these nodes are the same kind of nodes provided by the same service provider, so the Ubuntu kernel seems to expose some crap that the Debian kernel does not expose. Ubuntu being a Debian derivative, it's maybe worth looking for the root cause in the Ubuntu patches for the kernel?

It's not clear what break when you have this error. Is it just some spam in the log (in which case I would expect Puppet not to hide it because it is indeed wrong)? Does it break things (in which case we can detect if a value read has null bytes and emit a warning and discard that value if so)?

waipeng commented 1 year ago

@smortex what are the kernel versions you tested with? I suspect the difference may be due to modules that gets loaded with the vendor default kernels.

The impact is that there is an error emitted when running Puppet. E.g.

root@example:~# puppet agent -t
Info: Using environment 'XXX'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Error: Facter: Fact resolution fact='kmods', resolution='<anonymous>' resolved to an invalid value: String "\u0000" contains a null byte reference; unsupported for all facts.
Info: Caching catalog for example.example.com
Info: Applying configuration version '12345678'
Notice: Applied catalog in 4.80 seconds

Puppet compilation still succeeds. However, this error makes it into /opt/puppetlabs/puppet/cache/state/last_run_report.yaml, triggering a Nagios alert for us (we monitor errors from puppet run).

smortex commented 1 year ago

@smortex what are the kernel versions you tested with?

Debian:

Ubuntu:

The Debian nodes have a lot of virtio modules loaded (the machines with a cloud kernel, the dedicated servers have no virtio modules):

$ facter kernelrelease
5.10.0-23-cloud-amd64
$ lsmod | grep virtio
virtio_console         40960  1
virtio_balloon         24576  0
virtio_net             61440  0
net_failover           24576  1 virtio_net
virtio_scsi            24576  2
scsi_mod              258048  4 virtio_scsi,sd_mod,libata,sg
virtio_pci             28672  0
virtio_ring            36864  5 virtio_console,virtio_balloon,virtio_scsi,virtio_pci,virtio_net
virtio                 16384  5 virtio_console,virtio_balloon,virtio_scsi,virtio_pci,virtio_net

The Ubuntu VMs have less modules, but these modules are loaded on Debian where we don't have this issue:

$ facter kernelrelease
4.15.0-213-generic
$ lsmod | grep virtio
virtio_scsi            20480  2
virtio_net             49152  0
waipeng commented 1 year ago

@smortex more info

On Ubuntu the virtio_mmio appears to be compiled into kernel

root@ubuntu:~# uname -a
Linux ubuntu 5.4.0-159-generic #176-Ubuntu SMP Mon Aug 14 12:04:20 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
root@ubuntu:~# grep virtio_mmio /lib/modules/5.4.0-159-generic/modules.builtin
kernel/drivers/virtio/virtio_mmio.ko

On Debian 11, it is a module

root@debian:~# uname -a
Linux debian 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2 (2022-10-21) x86_64 GNU/Linux
root@debian:~# modprobe virtio_mmio
root@debian:~# lsmod | grep virtio_mmio
virtio_mmio            16384  0
virtio_ring            36864  7 virtio_rng,virtio_mmio,virtio_console,virtio_balloon,virtio_pci,virtio_blk,virtio_net
virtio                 16384  7 virtio_rng,virtio_mmio,virtio_console,virtio_balloon,virtio_pci,virtio_blk,virtio_net

After loading the module, it appears that the structure of /sys is different

root@debian:~# ls /sys/module/virtio_mmio/
coresize  drivers  holders  initsize  initstate  notes  refcnt  sections  taint  uevent

It doesn't have parameters/device, and hence no null value.

Whether this is difference in kernel version or other compilation options, I have no idea.

sid3windr commented 1 year ago

The impact is that there is an error emitted when running Puppet. E.g. (...)

Puppet compilation still succeeds. However, this error makes it into /opt/puppetlabs/puppet/cache/state/last_run_report.yaml, triggering a Nagios alert for us (we monitor errors from puppet run).

This is not the only impact, the kmods fact is empty, making any code relying on it useless.

I have this error on "Debian 11" machines as well but they're actually running a Proxmox VE kernel (which is newer than the Debian ones, and likely comes from the same source as Ubuntu's kernels).

saz commented 1 year ago

As the kmods fact is currently broken, I guess we should just strip the null value and that's it.