saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14.09k stars 5.47k forks source link

[BUG] `pkg.installed` is very slow with custom pkg provider #66875

Open yangskyboxlabs opened 1 week ago

yangskyboxlabs commented 1 week ago

Description When using a custom pkg provider, pkg.installed state function spends a very long time executing code outside of pkg.list_pkgs, pkg.latest_version et al.

Setup (We are building internal tooling on top of salt, so the setup is pretty involved, and may not be practically reproducible in whole. I've isolated the subsystems involved as much as I can for reproduction, but may not be able to easily provide what we have produced locally. All details in Steps to Reproduce section.)

Please be as specific as possible and give set-up details.

Steps to Reproduce the behavior We are attempting to integrate winget with salt for some internal tooling, and discovered that it's just really slow.

On what should be a trivial state.highstate test=True check for a package already installed, pkg.installed takes upwards of 5 or 6 additional seconds beyond the execution of winget itself:

Ninja Build:
  pkg.installed:
    - name: Ninja-build.Ninja
      provider: winget

image

In attempting to isolate the issue, I've discovered that salt is loading all functions from all modules due to constructs like:

def _resolve_capabilities(pkgs, refresh=False, **kwargs):
    ....
    if not pkgs or "pkg.resolve_capabilities" not in __salt__:
        return pkgs, refresh

Stubbing out the small set of functions checked for in this way almost completely eliminates this "overhead".

def resolve_capabilities(pkgs: Sequence[str], **_) -> Sequence[str]:
    return pkgs

def hold():
    raise NotImplementedError

(This is not like-for-like, as some tracing info has been since removed, but performance difference is obvious.) image

Now that I've seen the underlying implementation, this is most likely affecting all pkg modules even when provider: is not used, but the impact is limited only to the first invocation. I've not debugged further to determine to what extent this is true.

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.) ```yaml local: Salt Version: Salt: 3007.1 Python Version: Python: 3.12.5 (tags/v3.12.5:ff3bc82, Aug 6 2024, 20:45:27) [MSC v.1940 64 bit (AMD64)] Dependency Versions: cffi: 1.16.0 cherrypy: 18.8.0 dateutil: 2.8.2 docker-py: Not Installed gitdb: 4.0.10 gitpython: 3.1.43 Jinja2: 3.1.4 libgit2: 1.7.2 looseversion: 1.3.0 M2Crypto: Not Installed Mako: Not Installed msgpack: 1.0.7 msgpack-pure: Not Installed mysql-python: Not Installed packaging: 23.1 pycparser: 2.21 pycrypto: Not Installed pycryptodome: 3.19.1 pygit2: 1.14.1 python-gnupg: 0.5.2 PyYAML: 6.0.1 PyZMQ: 25.1.2 relenv: Not Installed smmap: 5.0.1 timelib: 0.3.0 Tornado: 6.3.3 ZMQ: 4.3.4 Salt Package Information: Package Type: Not Installed System Versions: dist: locale: utf-8 machine: AMD64 release: 10 system: Windows version: 10 10.0.19045 SP0 Multiprocessor Free ```

Additional context Add any other context about the problem here.

yangskyboxlabs commented 1 week ago

I see that LazyDict provides a _missing() method to short-circuit cases where symbols are known to never exist, but LazyLoader does not attempt to do anything special with it. I was looking for a way to inject this metadata into __salt__ somehow, but that doesn't seem feasible without monkeypatching it at runtime.

max-arnold commented 1 week ago

What is likely happening here is the pkg.installed state tries to call some execution module function that is not implemented in your custom provider. At that moment the loader is trying to find the function using more and more aggressive search methods and eventually resorts to trying all available modules (because to know whether a module actually has the function, the loader needs to load it and check its virtualname).

It seems you already found a solution - implement the missing functions that are expected from a provider API contract, even if they are no-op.

Another thing to try on Windows is set multiprocessing: false