ncerny / chef_stack

4 stars 6 forks source link

Fix chef_user check for existing users #13

Open patcon opened 7 years ago

patcon commented 7 years ago

I'm getting an error when using chef_user resource after a second user is created. It then starts failing for the first one, saying it already exists. So it's trying to add the existing user. Digging in, i think I've found the issue

chef-users seems to get saved in run-state as things like this:

node.run_state['chef-users'] = "delivery\n"
node.run_state['chef-users'] = "\n"
node.run_state['chef-users'] = "delivery\nsomeother\n"
node.run_state['chef-users'] = "someother\ndelivery\n"

The current guard for that is:

not_if { node.run_state['chef-users'].index(/^#{new_resource.username}$/) }

This doesn't seem right, as the code produces the following results:

# comment shows return
node.run_state['chef-users'] = "delivery\n" # 0
node.run_state['chef-users'] = "\n" # nil
node.run_state['chef-users'] = "delivery\nsomeother\n" # 0
node.run_state['chef-users'] = "someother\ndelivery\n" # 10

I suspect this isn't what we want?

Created a fix commit here: https://github.com/patcon/chef_stack/commits/blendive/develop

Let me know if you recognize this as an issue, and happy to create a PR

cc: @mrjcleaver

patcon commented 7 years ago

Found the same issue in chef_org resource. Working on that too

patcon commented 7 years ago

Not sure how to reason about this bit of chef_org and chef_user: https://github.com/ncerny/chef_stack/blob/master/resources/chef_org.rb#L33

(this was helpful context: https://github.com/chef/chef-rfc/blob/master/rfc056-load-and-converge.md#non-existence)

Seems like it perhaps just worked by happenstance before? Could you advise what it was trying to guard against, so I know how to treat it?

Thanks!

ncerny commented 7 years ago

Hey @patcon! Thanks for taking an interest in chef_stack!

I believe the code as written is correct:

chef (12.12.15)> node.run_state['chef-orgs'] = <<-EOF
chef"> delivery
chef"> chef
chef"> test123
chef"> EOF
 => "delivery\nchef\ntest123\n"
chef (12.12.15)>
chef > if node.run_state['chef-orgs'].index(/^delivery$/)
chef ?>   p 'exists'
chef ?> else
chef >   p 'does not exist'
chef ?> end
"exists"
 => "exists"
chef (12.12.15)> if node.run_state['chef-orgs'].index(/^chef$/)
chef ?>   p 'exists'
chef ?> else
chef >   p 'does not exist'
chef ?> end
"exists"
 => "exists"
chef (12.12.15)> if node.run_state['chef-orgs'].index(/^franklin$/)
chef ?>   p 'exists'
chef ?> else
chef >   p 'does not exist'
chef ?> end
"does not exist"
 => "does not exist"

Thus, the resource block to create the user/org will only execute if that index returns a falsey (such as 'false' or 'nil'). Any other return (such as '0' or '10') will be truey, and prevent the resource from executing.

The load_current_value is saying "If the user/org is not found in this list of users/orgs, then the current value (the object that describes the state of the system prior to convergence) does not exist - IE, the User/Org does not exist prior to converge. Chef uses this to determine if it needs to converge a resource or not.

Was there a specific problem you were seeing that made you look at this code?

ncerny commented 7 years ago

I see you said the issue was it tries to recreate the first user after you add a second. Do you have the output from a run that we can take a look at?

patcon commented 7 years ago

Hm. Odd. But yep, save a partial log for our internal ticket: https://gist.github.com/patcon/2703540e68984252a4be13bcde1c620b

You're right, forgot that zero is truey in ruby-land. Perhaps that's a reason to use something more intuitive like match, but I digress :)

The issue was pretty persistent for us, although there was a small amount of inconsistency on first few runs, but then settled into consistent failure while reconverging. Making the chancg resolved it.

(Fwiw, our script knife bootstraps with our top-level run-list item on each run, in case that confuses the run_state logic somehow.)

If nothing sticks out, I'll have to revert away from our fork and try to figure out the real reason later. Any insight is appreciated!

ncerny commented 7 years ago

Good point. I don't see a reason not to use match!

The only time I've seen this behavior is when the server isn't responding to the user-list/org-list. Which we should handle more gracefully. Is it possible that the server is restarting while the recipe is running? Or is it possible that high load is causing the issue?

As for your change - I'm happy to merge it in; however, I don't like \b\b. I prefer ^$ because it's easier to read from a regex perspective. We should also change to using match instead of index. But if that fixes your problem, then there doesn't seem to be a reason not to do it. If you want to make a PR with those changes, we'll get a review going.

patcon commented 7 years ago

Hm. We were using a system with much lower specs than recommended, so perhaps that's slowed things down and brought up an edgecase. We've moved to manual install for the next little while in the interest of speedy demos, but I'll investigate when we come back to using chef-services cookbook :)

Thanks for being a great sounding board @ncerny!