shortdudey123 / chef-gluster

Chef cookbook for deploying Gluster
https://supermarket.chef.io/cookbooks/gluster
Apache License 2.0
24 stars 47 forks source link

After successful initial run, subsequent runs blow up at server_extend #55

Open donovanmuller opened 8 years ago

donovanmuller commented 8 years ago

Initial run of gluster::server is successful. Volume created and started. When gluster::server runs again, the following gets vomited out:

NoMethodError
-------------
private method `select' called for nil:NilClass

...

Relevant File Content:
----------------------
/var/chef/cache/cookbooks/gluster/recipes/server_extend.rb:

   17:        next
   18:      end
   19:
   20:      unless node.default['gluster']['server']['volumes'][volume_name].attribute?('bricks_waiting_to_join')
   21:        node.default['gluster']['server']['volumes'][volume_name]['bricks_waiting_to_join'] = ''
   22:      end
   23:
   24>>     peer_bricks = chef_node['gluster']['server']['volumes'][volume_name]['bricks'].select { |brick| brick.include? volume_name }
   25:      brick_count += (peer_bricks.count || 0)
   26:      peer_bricks.each do |brick|
   27:        Chef::Log.info("Checking #{peer}:#{brick}")
   28:        unless brick_in_volume?(peer, brick, volume_name)
   29:          node.default['gluster']['server']['volumes'][volume_name]['bricks_waiting_to_join'] << " #{peer}:#{brick}"
   30:        end
   31:      end
   32:    end
   33:
donovanmuller commented 8 years ago

node.set['gluster']['server']['volumes'][volume_name]['bricks'] does not seem to be set:

chef-gluster-55

shortdudey123 commented 8 years ago

It is set here: https://github.com/shortdudey123/chef-gluster/blob/master/recipes/server_setup.rb#L52

Can you verify node['gluster']['server']['volumes']['ose3-vol']['peers'] contains the FQDN or hostname of the node?

donovanmuller commented 8 years ago

It does, I left it unexpanded for the screenshot but it was definitely populated.

shortdudey123 commented 8 years ago

Can you post the context of the failed run? (not just the exception)

andyrepton commented 8 years ago

Hi @donovanmuller!

Sorry to hear you are having issues. This appears that your node is trying to load another chef-client that doesn't have that attribute set. Could you please confirm that the same cookbook was run on all nodes that are in your peer list, that the chef node name is the same as the peer name that gluster is using (sometimes when the chef node name is an FQDN and not a hostname or vice versa this can cause a problem like this).

What would really help is the output of your node['gluster']['server']['volumes'] entry in your cookbook attributes file, and the attribute node['gluster']['server']['volumes']['ose3-vol'] from each of your peers.

Thanks in advance!

Andy

donovanmuller commented 8 years ago

@Seth-Karlo Below is my complete gluster attributes:

default['gluster']['version'] = '3.7'
default['gluster']['server']['brick_mount_path'] = '/data'
default['gluster']['server']['disks'] = []
default['gluster']['server']['volumes'] = {
  'ose3' => {
    'peers' => ['master01.bison.pi.b','node01.bison.pi.b'],
    'replica_count' => 2,
    'volume_type' => 'replicated',
    'disks' => ['/dev/sda4'],
    'size' => '10G'
  }
}

master01-attr

node02-attr

Is there anything else you need?

andyrepton commented 8 years ago

Thank you for your report, I apologise for taking so long to respond. I'll see if I can reproduce at this end and get back to you.

alez007 commented 8 years ago

Any news about this ? I'm experiencing the same problem on opsworks

shortdudey123 commented 8 years ago

@alez007 can you verify the cookbook version you are using so that we make sure we are looking at the same thing?

andyrepton commented 8 years ago

I'm pretty confident this is caused by chef_node not being set. I've been a bit distracted lately, but I'll try and look into this.

laurencepettitt commented 8 years ago

I am using OpsWorks and experiencing this problem, I am wondering if it could be OpsWorks' fault and the way it updates the cookbooks on each node such that every time the "custom cookbooks" are updated, it wipes the node's attributes?

shortdudey123 commented 8 years ago

@LorenzoPetite possibly? i don't use OpsWorks and am not too familiar with it @Seth-Karlo you use OpsWorks at all and might be able to shed light here?

andyrepton commented 8 years ago

@shortdudey123 @LorenzoPetite Sorry no, I've never used Opsworks before. We could possibly test this by adding some echo statements into the cookbook in print out those attributes during compile time. If they report as empty we can then start looking into whether or not they are set properly.

laurencepettitt commented 8 years ago

Following @Seth-Karlo's suggestion, I tested with some echo statements. In the server_setup recipe, i found that: node['gluster']['server']['volumes'][volume_name]['bricks'] produces: ["/gluster/servu/brick"]

However in the server_extend recipe, the reason chef_node['gluster']['server']['volumes'][volume_name]['bricks'] causes an error undefined method '[]' for nil:NilClass is because chef_node['gluster'] is somehow nil. Strangely, echoing chef_node produces node[gluster1]

I realise now this is actually a slightly different error than @donovanmuller's, but in both cases there seems to be a problem with attribute persistence.

How could this be possible?

theundefined commented 7 years ago

chef_node - iterates over all nodes in cluster. So - it won't iterate when on any node bricks are empty. I have the same problem on one of my test environment. I'm not sure but it can be connected with any chef error during setup cluster, when bricks aren't propagated to chef server, methinks.

wndhydrnt commented 7 years ago

I stumbled upon this today too. On the initial run of the chef-client, the cookbook failed due to an error in the configuration on my side. The chef-client was able to create the volume on the first run though. Executing knife node show <NODE NAME> -a gluster confirmed that ['gluster]['server']['volumes']['myvolume']['bricks'] was empty. Subsequent runs of chef-client failed with the error stated in the first comment of this issue. As far as I know a chef-client persists its attributes on the Chef server only after a successful run. No run of the chef-client completed successfully so the bricks attribute can never be saved.

My workaround was to set ['gluster']['server']['server_extend_enabled'] to false, trigger a run of the chef-client (which succeeded) and set ['gluster']['server']['server_extend_enabled'] back to true.