puppetlabs / facter

Collect and display system facts
https://puppet.com/open-source/#osp
Apache License 2.0
616 stars 494 forks source link

provide a way to block all volatile facts #2718

Open bastelfreak opened 1 month ago

bastelfreak commented 1 month ago

Use Case

I think those facts don't make much sense. They change on each puppet run and produce a lot of IO on PuppetDB and postgresql. Monitoring the system load should be done by a monitoring system. And instead of reporting the uptime, facter should only report the boot time.

Describe the Solution You Would Like

Less facts that change on each run

Describe Alternatives You've Considered

A clear and concise description of any alternative solutions or features you've considered.

Additional Context

Add any other context or screenshots about the feature request here.

gcoxmoz commented 1 month ago

load_averages and system_uptime are mentioned here. memory.system, mountpoints.*.(available|used) all change 'every second', are those on the chopping block? processors.speed steps around a fair bit on most of my servers too.

I personally don't have a use for these facts, but it's cheaper for facter to keep providing these data than it is to have folks reinvent a wheel after these OS-reporting pieces are taken away.

If the reasoning is to reduce db load, what about a facter --no-volatile-facts to skip facts like these, or a similar ingestion filter on the puppetdb side to default-ignore them on intake?

bastelfreak commented 1 month ago

it's cheaper for facter to keep providing these data than it is to have folks reinvent a wheel after these OS-reporting pieces are taken away.

I think it is a tradeoff between performance cs how many people use the facts. And if there usage is almost zero it makes sense to remove them from core facter.

ekohl commented 1 month ago

If the reasoning is to reduce db load, what about a facter --no-volatile-facts to skip facts like these, or a similar ingestion filter on the puppetdb side to default-ignore them on intake?

Foreman already implements something like this, because it creates too much noise in the database. But it would be good if this was more natively implemented. Perhaps the custom fact API should be extended to support marking it as volatile? Then fact upload to PuppetDB and Foreman could only send the non-volatile facts (hard facts?).

github-actions[bot] commented 1 month ago

Migrated issue to FACT-3470

joshcooper commented 1 month ago

We currently have two ways to block facts from ending up in puppetdb. Facts can be blocked via facter.conf so they are not sent and they can be blocked in the puppetdb terminus. Are there specific reasons why those approaches don't work?

About volatile, etc, it would be easy to update the Facter.add API to accept metadata, but it doesn't work for external facts like executables, ini files, etc. There would need to be some other way to configure the fact. But at that point you might as well block the fact?

I know facter has lots of volatile facts that seem dubious, but we literally get bug reports about the processor speed being incorrect, the amount of free space being incorrect on AIX, etc. In other words, I'm fairly sure there is a non-zero number of people relying on facts that most would consider not-useful.

ekohl commented 1 month ago

I'll note that instead of uptime you can also calculate the boot time. That would make the fact stable

bastelfreak commented 4 weeks ago

@joshcooper I think the facter.conf approach is quite limited to certain groups / toplevel facts? I think a config option like no-volatile-facts would be way easier for the users.

joshcooper commented 3 weeks ago

facter has a number of builtin fact groups (see facter --list-block-groups) including ones for load_average and uptime. If you configure facter to block those groups, then you'll get the desired behavior:

❯ cat facter.conf 
facts : {
  blocklist : [ "load_average", "uptime" ],
}
❯ bx facter load_averages                       
{
  15m => 2.05,
  1m => 2.38,
  5m => 2.31
}
❯ bx facter load_averages --config ./facter.conf   
❯ bx facter system_uptime                       
{
  days => 37,
  hours => 911,
  seconds => 3281746,
  uptime => "37 days"
}
❯ bx facter system_uptime --config ./facter.conf

I'd prefer to not add a third way to block facts, given the existing ones already work. Perhaps we could add a predefined volatile fact group and allow those facts to be blocked that way

ekohl commented 3 weeks ago

That sounds reasonable, but the goal is to easily drop them from central fact stores (PuppetDB, Foreman) so I wonder if Puppet could default to blocking those when sending. That may be more of a Puppet 9 change, but I can live with a long term plan

bastelfreak commented 3 weeks ago

@joshcooper can we already block mountpoints.*.(available|used)*?

joshcooper commented 3 weeks ago

can we already block mountpoints.*.(available|used)*?

Not currently, see https://github.com/puppetlabs/facter/issues/2658. The mountpoints fact returns a hash value, with one key for each mount: https://github.com/puppetlabs/facter/blob/43111c9b6524ba14bed2893580b0a67aec10051b/lib/facter/facts/linux/mountpoints.rb#L17

Blocking (and caching) work at the per fact class layer, so you can only block the entire mountpoints fact, but not specific mounts or attributes within each mount.

Confusingly, you can block some fact hash values like load_averages and os.distro.release, because there is a specific fact implementation for each of them: https://github.com/puppetlabs/facter/blob/43111c9b6524ba14bed2893580b0a67aec10051b/lib/facter/facts/linux/load_averages.rb#L10 https://github.com/puppetlabs/facter/blob/43111c9b6524ba14bed2893580b0a67aec10051b/lib/facter/facts/linux/os/distro/release.rb#L22

The part that's missing is the ability to exclude/reject fact values that were collected, but not needed, like mountpoints.*.available.

pikrzysztof commented 17 hours ago

I'd be also interested to block some mountpoint-related facts. I'm not interested in any mountpoints under /var/lib/docker directory, but I am interested in /var/lib/docker itself. With big machines the /var/lib/docker has quite a lot of mounts and stresses the fact subsystem without good reason.

Mountpoints managed by container runtimes are pretty volatile so I think this issue is a good place for this feature request.