nii-cloud / dodai-deploy

Deployment Tool for OpenStack(Nova, Glance and Swift) and Hadoop using Puppet
https://github.com/nii-cloud/dodai-deploy/wiki
68 stars 25 forks source link

Nova compute installation failed on multi-node environment #30

Closed JianlanWang closed 12 years ago

JianlanWang commented 12 years ago

I met a strange problem. When I tried to install the component of nova-compute on more than 4 nodes, intermittenly some of nodes would reported one puppet error: "Error 400 on SERVER: Could not find class ...". Here is a log segment from /var/log/syslog for your reference:

Oct 24 09:33:11 puppetcilent2 puppet-agent[41721]: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find class nova_e::nova_compute::install for puppetcilent2.dodai.com on node puppetcilent2.dodai.com Oct 24 09:33:11 puppetcilent2 puppet-agent[41721]: Not using cache on failed catalog Oct 24 09:33:11 puppetcilent2 puppet-agent[41721]: Could not retrieve catalog; skipping run Oct 24 09:35:05 puppetcilent2 puppet-agent[42131]: Caching catalog for puppetcilent2.dodai.com Oct 24 09:35:05 puppetcilent2 puppet-agent[42131]: Applying configuration version '1351042243'

guanxiaohua2k6 commented 12 years ago

Can you confirm whether there is file /etc/puppet/modules/nova_e/nova_compute/install.pp in dodai-deploy server?

JianlanWang commented 12 years ago

Sure. I can install nova-compute on 2 nodes successfully. But if the node number is more than 4 (including 4), this error would appear on some nodes (not all nodes.), which means some nodes installed nova-compute well, but some failed. It seems a Puppet defect.

guanxiaohua2k6 commented 12 years ago

OK, thank you. I will confirm the problem.

aimonb commented 12 years ago

Hi, the error makes me think its a server issue. Look at your puppetmaster setup.. Are you Reverse Proxying it? Do workers match balancers? If just webrick.. do you have enough listeners? If you can share some of the puppet-master setup I may be able to lend more directed help...

JianlanWang commented 12 years ago

What kind of info on puppet master do you need?

guanxiaohua2k6 commented 12 years ago

I am trying to repeat the error in my environment. If I need some info later, I will ask you. Thanks in advance.

aimonb commented 12 years ago

Info about the web server.. Which you are using etc.. If u can provide.

JianlanWang commented 12 years ago

I just set up web server using setup.sh in dodai-deploy. No change on it. So it only used Webrick? Please tell me how I can get the info if you need. Thanks :)

guanxiaohua2k6 commented 12 years ago

Yes, your puppet server is using the default web server webrick. And if you want to use other web server, you can refer the page http://projects.puppetlabs.com/projects/puppet/wiki/Using_Mongrel_On_Debian.

BTW, I tried in my environment, but couldn't repeat the error.

JianlanWang commented 12 years ago

All right. Maybe I need more tests. From google, I found someone got the same Puppet error, but still didn't get the root cause and resolution.

aimonb commented 12 years ago

In general webrick can only handle one connection at a time.. most likely there is more to it.. I'll take a look at the script and see if I can lend a hand..

JianlanWang commented 12 years ago

I'm planning to use other web server for puppet master. If any update, I'll let you know.Thanks a lot. BTW, what's your plan for OpenStack Folsom release?

guanxiaohua2k6 commented 12 years ago

I've added support for folsom (except swift), you can see the options in the "new proposal" page if you updated dodai-deploy server. But tests of it are not enough.

aimonb commented 12 years ago

We successfully use Apache w/ Mod_Passenger as a web server for puppet. It seems to scale well. If you are on Ubuntu the package setup is VERY easy.

habuka036 commented 12 years ago

aimon, thank you for your advice. :)

JianlanWang commented 12 years ago

Hi Xiaohua, here's great news to let you know that I replaced webrick with mongrel + Apache2 to resolve this issue. Thanks so much. I think you can close the issue.

guanxiaohua2k6 commented 12 years ago

Thank you for your news.