puppetlabs / r10k

Smarter Puppet deployment
Other
799 stars 354 forks source link

R10k 2.5.0 10 times slower as compared to 2.3.0 #696

Closed daveseff closed 3 years ago

daveseff commented 7 years ago

performing a 'r10k deploy environment -p' on 2.3.0 ( puppet enterprise 2016.2) took aprox 50 seconds with 30 environments:

# /usr/local/bin/r10k deploy environment -p
...
real    0m50.000s
user    0m39.407s
sys 0m5.827s

Since upgrading to 2.5.0 ( puppet enterprise 2016.5 ):

# /usr/local/bin/r10k deploy environment -p
...
real    10m27.468s
user    3m19.093s
sys 0m58.130s

Downgrading back to 2.3.0, r10k performs back at sub-minute times.

scotje commented 7 years ago

@daveseff could you provide a little more context about your environment? How many puppet environments are being deployed, any customization you have done to r10k config or anything like that? Also seeing the full output with at least -v debug would be useful.

daveseff commented 7 years ago

I am deploying 30 environments with aprox. 55 modules each. There has not been any modification ( other than anything puppet enterprise does to it) I've attached the debug output although there isn't anything useful in it.

r10k.debug.txt

sedunne commented 7 years ago

I also ran into this issue today when trying to upgrade an old 2.1.1 install. After testing a bunch of versions, it looks like starting in 2.5.3 something causes an extreme increase in deploy time:

2.5.2
real    3m49.001s
user    0m48.636s
sys 3m23.621s

2.5.3
real    64m23.950s
user    4m42.240s
sys     10m23.414s

My debug output also is also "normal", however, I did notice that in 2.5.2 the modules section completes almost instantly, where as 2.5.3 modules from the forge seem to process one at a time with a noticable delay between them. Otherwise no errors or anything different in the output of the versions. This is on Puppet foss, so it doesn't seem to be limited to PE.

Based on just the real-time output, my guess is something is happening with modules from the forge, that is causing them to either take longer to process or always be re-fetched. Modules from git sources that are pinned to refs or tags seem to be fine.

hameno commented 7 years ago

I have the same problem. 2.5.2 ist many times faster than 2.5.3+

haus commented 7 years ago

If anyone who has seen this issue has a control repo that is public or that they'd be willing to share (we can arrange a private upload of some sort) that would help a lot in diagnosing and reproducing the issue so we can try to fix it.

ajroetker commented 7 years ago

@sedunne When you ran your test, were you deploying cleanly? I.e. the cache was discarded before the deploy and you were installing all the modules in the Puppetfile for the first time in both cases?

regonzalo commented 7 years ago

I have the same problem, a stack of four PuppetMaster three of which have 2.5.0 and one 2.5.4 and the last one goes much slower when running r10k

dMopp commented 7 years ago

Same here, Downgraded from r10k-2.5.4 to r10k-2.5.2 and its fast again.

(Puppetmaster 3.7.2-4 (Debian8) + Foreman 1.13.4-1)

geoffr commented 7 years ago

We have between 11 and 20 branches being deployed with around 97 modules each (mix of forge and git based). The 2.5.4 version of the gem takes at least 10-20 minutes, the 2.5.2 version takes just over 2 minutes. Rolling the gem forward and backwards shows consistent behaviour (speed increases then slows). These would be cached environments.

haus commented 7 years ago

@geoffr @dMopp @regonzalo @hameno @sedunne @daveseff are any of you able to share a Puppetfile/control repo with us either publicly or privately to help narrow down what is causing this slowdown? Thanks!

sedunne commented 7 years ago

I have a Puppetfile I can share that I've pruned private sources from. LMK the best way to share (even if just gist) and I can get it.

haus commented 7 years ago

@sedunne gist would work fine

sedunne commented 7 years ago

gist here. There's only 3-4 modules that I removed, and all were from git sources (which didn't seem to have any delay when watching debug output).

haus commented 7 years ago

@sedunne awesome. thanks so much!

sedunne commented 7 years ago

no problems! let me know if there's anything else you'd like to check or have me test.

haus commented 7 years ago

@sedunne sounds good. will do.

andersonmills commented 7 years ago

@sedunne When redeploying the environment, are you using the -p (--puppetfile) option?

sedunne commented 7 years ago

yeah, the full command that is ran is r10k deploy environment -p

andersonmills commented 7 years ago

@sedunne Got it. We're seeing the behavior on this end now. I think I know what the problem is.

On redeploy w/ -p

  roquefort:~/puppet/src/r10k ((2.5.2))
  $ time be r10k deploy environment slowdown_test -p -v DEBUG1 --color
  real    0m3.389s
  user    0m0.773s
  sys     0m0.510s

  roquefort:~/puppet/src/r10k ((2.5.3))
  $ time be r10k deploy environment slowdown_test -p -v DEBUG1 --color
  real    1m19.357s
  user    0m2.274s
  sys     0m0.648s
andersonmills commented 7 years ago

The problem is that, starting with v2.5.3, we started checking for deprecated modules. This means that for each forge module listed in the Puppetfile, r10k calls out to the Forge to see if the module's been deprecated. This is what is adding the extra time.

Now, the question becomes how best to fix this situation.

dMopp commented 7 years ago

Is that because the forge is slow or...? I mean in our case that whould end in ~20 checks for 20 modules which should take 10-20 seconds (which sounds reasonable)

As a hotfix depending on the effort, a config flag to turn the behavior off would be enough for me.

andersonmills commented 7 years ago

@dMopp What kind of speeds were you seeing for deploying 20 modules in one environment? The deprecation check is for every module in every environment if one runs r10k deploy environment -p.

Also, speeds return to normal without the -p flag, if that helps.

  on redeploy w/o -p

  roquefort:~/puppet/src/r10k ((2.5.2))
  $ time be r10k deploy environment slowdown_test -v DEBUG1 --color
  real    0m1.634s
  user    0m0.584s
  sys     0m0.094s

  roquefort:~/puppet/src/r10k ((2.5.3))
  $ time be r10k deploy environment slowdown_test -v DEBUG1 --color
  real    0m1.516s
  user    0m0.599s
  sys     0m0.093s
dMopp commented 7 years ago

Ah ok, so with every env it goes up to 120 already. I cant test it now, need to wait until monday.

But in that case it might be an idea to create a cache for the result and checking the cache first. (So lets say ntp is not in cache, it will ask forge and put the result to the cache with a ttl of 24h (make it configurable). This should speed up the check a lot with environments with similar setups (like ours).

Von meinem iPhone gesendet

Am 12.05.2017 um 21:53 schrieb Anderson Mills notifications@github.com:

@dMopp What kind of speeds were you seeing for deploying 20 modules in one environment? The deprecation check is for every module in every environment if one runs {{r10k deploy environment -p}}.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

andersonmills commented 7 years ago

TL;DR - Downgrade to r10k v2.5.2 until r10k v2.5.5 comes out with the fix.

Sorry that I've been slow to follow up with this. Here's what came out of the investigation. The deprecation check was intended to only happen on initial module install when the deprecation information would still be in memory, but it got added to the module synchronize method, which happens on every redeploy.

The fix in v2.5.5 is to move the deprecation check to the initial install.

To downgrade, you'll need to...

/opt/puppetlabs/puppet/bin/gem uninstall r10k
/opt/puppetlabs/puppet/bin/gem install r10k -v 2.5.2
andrewm3 commented 7 years ago

I'm seeing the slowdown in 2.4 as well - not just > 2.5.2. Our workflow is a little different so we actually have very few modules (only 2) but around 300 environments. Running benchmarks with a subset (100 branches), I got the following runtimes (ran them a few times to get an average):

version runtime
2.3.0 0m11.630s
2.4.0 2m29.694s
2.5.2 2m33.969s
2.5.4 2m39.140s

All runs were with the '-p' option. Both modules are sourced from an internal git server, one tracked by tag and the other just pointing to the master branch.

RameshBahavathi commented 6 years ago

Currently am using r10k 2.3.0 version for cloning puppet modules. Its taking 1 min of time to download single puppet module. Any suggestion to reduce the download time ?

dMopp commented 6 years ago

Downgrade

Von meinem iPhone gesendet

Am 21.10.2017 um 21:39 schrieb RameshBahavathi notifications@github.com:

Currently am using r10k 2.3.0 version for cloning puppet modules. Its taking 1 min of time to download single puppet module. Any suggestion to reduce the download time ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

RameshBahavathi commented 6 years ago

Its works properly after downgrade the r10k version from 2.3.0 to 2.2.2. Thank you

andersonmills commented 6 years ago

@andrewm3 @RameshBahavathi I have been looking into these slowdown issues, and they do not seem to be the same as the original poster's issue. I'm happy to try to pursue this further, but I'll need more information.

Also, @andrewm3 we've added forge module caching to the 2.5.x branch, which means 1 download for the first install of a module, then the cached copy will be used after that.

Sebor commented 6 years ago

We use version 2.6.2 but this one is still slow. # time /opt/puppetlabs/puppet/bin/r10k deploy environment -ptv debug2 --color real 0m7.668s user 0m2.373s sys 0m2.417s Withouy -p: # time /opt/puppetlabs/puppet/bin/r10k deploy environment -tv debug2 --color real 0m1.725s user 0m0.872s sys 0m0.418s And there were no any changes in code. With updates in code we wait for about 3 minutes (with -p). I think it is too long with our number of modules. The puppetfile in attachment.

Puppetfile.txt

daveseff commented 6 years ago

With -p it is going to be slower I think. -p tells r10k to update all external modules so it goes out to check for updates.

sammcj commented 6 years ago

This is really painful.

I'm not sure I like the idea of it going off to the forge for every module from a privacy point of view.

Does it also mean that if the internet is unavailable (perhaps say during a DR situation) that all your puppet deployments will fail because the forge is unavailable?

hunkeelin commented 6 years ago

https://github.com/hunkeelin/r9000k

There's also g10k the one I wrote is a more simple version.

sammcj commented 6 years ago

Has anyone figured out if there's a way to prevent r10k from calling out to the forge to check for depredations each time it checks out a branch?

Slow r10k is killing us (slowly!)

sflttc commented 6 years ago

Sammcj you can try my r9000k i package it with .rpm or .deb for easy install. You have to specify your configuration in /etc/r9000k/r9000k.conf and follow the syntax in README

sammcj commented 6 years ago

@sflttc temping - have you tried it with Puppet Enterprise?

hunkeelin commented 6 years ago

U can install the program and set to a different dir for testing purposes.

On Monday, May 7, 2018, Sam notifications@github.com wrote:

@sflttc https://github.com/sflttc temping - have you tried it with Puppet Enterprise?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/puppetlabs/r10k/issues/696#issuecomment-387298715, or mute the thread https://github.com/notifications/unsubscribe-auth/AGPPe8h9ly9S4I0TOiGNy0Ai43isp9ROks5twTtcgaJpZM4LmPjg .

sammcj commented 6 years ago

Easy. Might give it a shot and report back :)

-- Sam McLeod

On 8 May 2018, at 16:48, hunkeelin notifications@github.com wrote:

U can install the program and set to a different dir for testing purposes.

On Monday, May 7, 2018, Sam notifications@github.com wrote:

@sflttc https://github.com/sflttc temping - have you tried it with Puppet Enterprise?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/puppetlabs/r10k/issues/696#issuecomment-387298715, or mute the thread https://github.com/notifications/unsubscribe-auth/AGPPe8h9ly9S4I0TOiGNy0Ai43isp9ROks5twTtcgaJpZM4LmPjg .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

sammcj commented 6 years ago

Haven't had time to test out your for @sfttc, I kind of wish @puppetlabs would come to the party and fix this, especially since it effects their paid product (and yes I've mentioned it in support tickets).

hunkeelin commented 6 years ago

@sammcj I took it down from my repo because of some complication. But if you still want it let me know. I don't think puppet is going to fix this, they don't have the talent to do it. The best bet for any performance tweek is to hire a capable engineer to fix it or fix it yourself like I did.

Raskil commented 5 years ago

Hi,

for a year this problem has bothered me now. Today I took the time to evaluate what is beeing slow about the forge call. As it tunrs out (at least in my environment) r10k is doing dns lookups of the forgeapi.puppetlabs.com host, eventhough it uses a proxy to fetch the modules. In my environment it is not possible to do external dns lookups, so they timed out. r10k does not report about that error, but silently ignores it and then starts to fetch the information and the modules via proxy, which works well because the proxy is doing the lookups.

So I added a hostentry on my system: 52.10.130.237 forgeapi.puppetlabs.com

The IP doesn't really matter, because it's actually never used by r10k, as it uses the proxy to connect to the API. But you should use one that is not inside of your proxy exception list :)

After that, r10k is as fast as it was on old versions.

Regards

Raskil

sammcj commented 5 years ago

@Raskil

So I added a hostentry on my system: 52.10.130.237 forgeapi.puppetlabs.com After that, r10k is as fast as it was on old versions.

We just tried this and it made no noticeable difference for us.

Raskil commented 5 years ago

@sammcj I used ruby-prof to profile r10k. It gave me a good overview of where most of the time during an r10k was spend. As noted in this thread, I saw that from an r10k perspective most time was spent in the method checking for deprecated modules. But that only seems to be the case because it's the first part which is actually connection to the forge. From there, I could see that the most time was actually spent in the faraday gem, calling the method proxy_from_env (this was the one responsible for doing the dns lookups to match the request hostname against IP-Adresses in the no_proxy url).

Maybe you can use the perf module as well to see where exactly your r10k run ist spending the most time?

Regards

Raskil

github-actions[bot] commented 3 years ago

This issue has been marked stale because it has had no activity for 60 days. The Puppet Team is actively prioritizing existing bugs and new features, if this issue is still important to you please comment and we will add this to our backlog to complete. Otherwise, it will be closed in 7 days.