voxpupuli / puppet-python

Puppet module for installing and managing Python, pip, virtualenvs and Gunicorn virtual hosts.
https://forge.puppetlabs.com/puppet/python
Apache License 2.0
200 stars 374 forks source link

Don't call pip freeze to check package version (performance cost in general, permission problems in specific case) #291

Open Mekk opened 8 years ago

Mekk commented 8 years ago

Summary

I faced permission problems which in the end turned out to be caused by python::pip calling pip freeze calling hg showconfig on modules unrelated to the installation in charge. While problems as such are specific to my somewhat specific way of using puppet, they can be easily resolved, and fixes make things working much faster in the general case.

Appetizer

$ time pip freeze | grep -e '^SQLAlchemy'
SQLAlchemy==0.8.4
(…)
real    0m5.488s
time pip show SQLAlchemy
(…)
Version: 0.8.4
(…)
real    0m0.198s

Permission problem

I use some Puppet manifest to support managing my development environment. It's light case, without any puppet master, with Puppet serving as a helper just to ensure various packages are installed. So I do things like

sudo puppet apply manifests/devel.pp --modulepath=./modules

I agree that this is not perfectly safe way of doing things, but that's also not a production…

With python::pip installed packages it used to work ……… until I performed some manual pip install --user --edit (on completely unrelated modules). Since then, Puppet manifests started to ignore python::pip modules, and debugging output showed up info like:

Debug: Exec[pip_install_cram](provider=posix): Executing check 'pip freeze | grep -i -e ^cram=='
Debug: Executing 'pip freeze | grep -i -e ^cram=='
(…)
Debug: /Stage[main]/Python_local/Python::Pip[cram]/Exec[pip_install_cram]/unless: not trusting file /home/marcink/DEV_hg/mercurial/extension_utils/.hg/hgrc from untrusted user marcink, group marcink
(…)
Debug: /Stage[main]/Python_local/Python::Pip[cram]/Exec[pip_install_cram]/unless: Error when trying to get requirement for VCS system Command /usr/bin/hg showconfig paths.default failed with error code 1 in /home/marcink/DEV_hg/mercurial/extension_utils, falling back to uneditable format
Debug: /Stage[main]/Python_local/Python::Pip[cram]/Exec[pip_install_cram]/unless: Could not determine repository location of /home/marcink/DEV_hg/mercurial/extension_utils
(… similar things repeated for all locally installed repositories …)
Debug: /Stage[main]/Python_local/Python::Pip[cram]/Exec[pip_install_cram]/unless: cram==0.6

(here I am python::pip-ing cram which is in no way related to locally installed extension_utils)

In the end it turns out that things works so:

Suggested fix

  1. Simpler but less effective version: swapping pip freeze to pip list and patching $grep_regexp to accomodate different syntax (pip list emits things like cram (0.6) instead of cram==0.6 emitted by pip freeze resolves my permission issue but does not offer noticeable performance gains (btw, for locally installed dirs the output is much more useful, for example mercurial-extension-utils (1.0.0, /home/marcink/DEV_hg/mercurial/extension_utils)). To get that fix I simply replaced freeze with list and edited the regexp this way

    if $ensure =~ /^((19|20)[0-9][0-9]-(0[1-9]|1[1-2])-([0-2][1-9]|3[0-1])|[0-9]+\.[0-9]+(\.[0-9]+)?)$/ {
     $grep_regex =  "^" + $pkgname  + " [\\(]" + $ensure + '[\\)]$'
    } else {
     $grep_regex = "^${pkgname} "
    }
  2. Much better idea would be to wrap pip show PKGNAME instead, see appetizer above for explanation why.
  3. I am not 100% sure whether whole unless is needed at all. Blindly calling pip install PkgName==Version may be faster than checking whether this call is needed. For example, when package cram is alredy installed at version 0.6, I have:
$ time pip freeze | grep cram
cram==0.6
real    0m5.535s
$ time pip show cram

---
Name: cram
Version: 0.6
(…)

real    0m0.199s
(…)
$ time pip install cram==0.6
Requirement already satisfied (use --upgrade to upgrade): cram==0.6 in
(…)
real    0m0.197s

so, as we can see the actual cost of checking whether pip install is needed is exactly equal to just calling it.

Deus-Vult commented 8 years ago

On performance; We were facing performance issues on one of our weaker nodes with a fair amount of Python packages which essentially was caused solely because of pip freeze being called repeatedly to check for present installations and so on for each package to install.

In order to resolve this I wrote a small wrapper around the stankevich-python and a custom fact.

Important to note is that this was for a masterless Puppet installation (pretty much like this I guess https://www.digitalocean.com/community/tutorials/how-to-set-up-a-masterless-puppet-environment-on-ubuntu-14-04), I have no clue if this has any impact on whether or not the solution is worthwhile for other installations.

Fact:

# Extracts python modules and puts them into a hash
# containing the name and version

modules = Hash.new

if Facter::Util::Resolution.which('pip')
  pip_output = `pip freeze`
  python_modules = pip_output.split("\n")
  python_modules.each { |package|
    parts = package.split("==")
    modules[parts[0].downcase] = parts[1]
  }

  Facter.add('python_modules') do
    setcode do
      modules
    end
  end
end

Note that this doesn't execute if it can't find pip.

Here's a small sample from the resulting manifest, showing how the fact was used:

define install_python_module ($ensure) {
  if $python_modules {
    case $ensure {
      'present': {
        unless $title in $python_modules {
          python::pip { $title: ensure => 'present' }
        }
      }

      'latest': {
        python::pip { $title: ensure => 'latest' }
      }

      'absent': {
        if $title in $python_modules {
          python::pip { $title: ensure => 'absent' }
        }
      }

      default: {
        # Deal with versions here
        #
        # I think the following snippet should work but I've not tested it
        #
        # if $title in $python_modules and $ensure != $python_modules[$title] {
        #   python::pip { $title: ensure => $ensure }
        # }
        notice("Version handling is not implemented yet")
      }
    }
  }

  else {
    python::pip { $title: ensure => $ensure }
  }
}

$present_packages = [ 'requests', 'pyyaml','validate-email', 'pycrypto', 'mock','iso8601']

install_python_module { $present_packages:
  ensure => "present"
}

Note the check for whether or not the python_modules variable is not undef due to the above mentioned dependency on pip to set up the fact, not the most glorious solution I've come up with but it works.

Maybe this is something the module could continue building upon. Perhaps it works the current way because of cross-compatibility issues or something similar. In my specific use case on Ubuntu it did however lower the execution time of repeated puppet apply down to a few seconds down from one or two minutes. Obviously first time setups are still slow but that doesn't affect our production environment.

fnoop commented 7 years ago

+1 I'm running puppet masterless on low-performance embedded systems and these pip freezes take up well over 50% of a run where nothing needs to change.

fnoop commented 7 years ago

To follow up on @Deus-Vult workaround above, here's a python external structured fact that you can put in a module (eg. \<module>/facts.d/pip.py):

#!/usr/bin/env python
import subprocess
modules = {}
pipoutput = subprocess.Popen(["pip", "freeze"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
pips = pipoutput.communicate()[0].split()
for pip in pips:
    [package,version] = pip.split("==")
    modules[package)] = version
print "python_modules="+str(modules)

Note older versions of puppet need to set this in puppet.conf [main]: stringify_facts = false

peimanja commented 7 years ago

+1