xorpaul / g10k

my r10k fork in Go
Apache License 2.0
127 stars 51 forks source link

Optionally don't check modules for updates if the Puppetfile hasn't changed #138

Open kasimon opened 5 years ago

kasimon commented 5 years ago

Hi Adrian,

bear with me, yet another wishlist issue: We always have dozens of environments with lots of modules and one thing that takes up a lot of time is g10k scanning every module's git repo for updates even in case the Puppetfile hasn't changed since the last check out. While I can see a point for that in situations where someone is tracking the latest version of a module, in cases where exact versions or commits are pinned in the Puppetfile making this check optional would result in a huge performance increase. Currently g10k runs close to a minute checking more than 500 repos on every run, where running git ls-remote against the puppet repo(s) should only take seconds.

BR Karsten

PS: I just pinged you on xing, in case you want to connect there :)

xorpaul commented 5 years ago

Check out https://github.com/xorpaul/g10k/releases/tag/v0.7.1

Now I'm saving the hash sum of the Puppetfile inside a file .g10k-deploy.json

{
  "name": "benchmark",
  "signature": "c140b0800399550395eb9640bb9cf227956abd01",
  "started_at": "2019-08-27T17:39:25.837150578+02:00",
  "finished_at": "2019-08-27T17:39:32.908326876+02:00",
  "deploy_success": true,
  "puppetfile_checksum": "b0de4967264974eeb2b3381e90c2a4aefb91f5f2b4cbb70eb863d5fbbcfd6e1c"
}

and then check if the Puppetfile in the environment has changed. If not then g10k outputs this:

Skipping Puppetfile sync of branch example_benchmark because /tmp/example/example_benchmark/Puppetfile did not change

Please try it out.

xorpaul commented 5 years ago

I think I have to remove this feature, because after thinking about it a bit I noticed the following problem.

What if you have a module inside your Puppetfile, which is simply tracking a branch and a commit to that branch was done?

The Puppetfile itself hasn't changed from the previous g10k run, so g10k wouldn't even bother to check if the latest commit to the tracked branch is still the same and never pick up the committed changes.

The only thing I can think of to speed up your 500 git modules inside your Puppetfile is to check when a commit hash has been specified for a module if this exact commit hash is already deployed before even trying to check for updates to that module's git repository.

But this would mean that you would've to track and specify the commit hash in the Puppetfile.

kasimon commented 5 years ago

While I can see a point for that in situations where someone is tracking the latest version of a module, in cases where exact versions or commits are pinned in the Puppetfile making this check optional would result in a huge performance increase.

I already mentioned that problem :)

The only thing I can think of to speed up your 500 git modules inside your Puppetfile is to check when a commit hash has been specified for a module if this exact commit hash is already deployed before even trying to check for updates to that module's git repository.

Yes, that would be possible. Actually in the meantime we changed our setup even further and now build each environment only once into a tarfile named after the hash of its newest commit in the our puppet repo and then ship these tarballs to our compile master. This has the additional advantage that if two environments are on the exact same code version their code is built only once. Also this allows us to move the g10k (now only for puppetfile) and puppet parser generate calls off of the compile masters, reducing their load even more. And this setup shares the same problem that we must make sure that the Puppetfile contains no 'moving' targets, but that policy is okay for us.

So we currently don't need that feature anymore, but I still think some intelligent performance optimization would be a good idea.

justinstoller commented 3 years ago

FWIW, we recently added an --incremental flag to r10k, which will load the Puppetfile in an existing environment if it exists, then sync the environment, and when loading the updated Puppetfile see if a) a module's version "floats", or b) if it is "static" but has changed between the two Puppetfile versions. We then only sync the modules that pass either of those tests.

If you implement similar functionality, it might be good to share the terminology of that kind of deploy being "incremental"?

xorpaul commented 3 years ago

@justinstoller Thanks for thinking of me/g10k :smiley:

What do you mean by a module version floats or is static? Is static simply a pinned version, like 1.0.1 and floats something like latest (What about a tag or branch?)

Can you post a link to the r10k code changes? maybe that's easier :smirk:

justinstoller commented 3 years ago

You got the idea! A static version is any explicit version given to a forge module (not :latest or left off) and for git modules it's declarations that specify :commit or :tag, or :ref (but only if the value given to :ref matches a 40 character sha).

The code is implemented on each module type class like R10K::Module::Git#L20-L28 but I don't know how helpful that is. Partly because Reid's been working on a new yaml specification format that gives every module an explicit type and version (so we have to check for type in that code and then treat version the same as we otherwise treat a ref).

The test code may be more helpful since it gives examples: (see spec/unit/module/git_spec.rb#L14-L39 for git and spec/unit/module_loader/puppetfile_spec.rb#L357-L386 for testing resolving behavior against this puppetfile spec/fixtures/unit/puppetfile/various-modules/Puppetfile).

Hope that helps!