tomwhite / whirr-cm

11 stars 26 forks source link

Combine scmserver, scmnode, cdhclient on one host? #5

Open misterbeebee opened 12 years ago

misterbeebee commented 12 years ago

I am doing some preliminary testing of a whirr-scm-derived cluster setup.

To minimize AWS spend, I'd like to run scmserver+scmnode+cdhclient on 1 host, to avoid spending m1.large rental rates on the scmserver and cdhclient.

Is this a usable configuration?

I tried to launch a cluster with this config, but whirr seems to hang after this step (from whirr.log):

Exception in thread "main" java.lang.IllegalStateException: The permission '207.170.241.2/32-1-7180-7180' has already been authorized on the specified group
...
Caused by: org.jclouds.http.HttpResponseException: command: POST https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 failed with response: HTTP/1.1 400 Bad Request; content: [The permission '207.170.241.2/32-1-7180-7180' has already been authorized on the specified group]
        at org.jclouds.aws.handlers.ParseAWSErrorFromXmlContent.handleError(ParseAWSErrorFromXmlContent.java:74)
        ... 9 more

2011-12-06 16:41:38,390 INFO  [org.apache.whirr.service.FirewallManager] (main) 
Authorizing firewall ingress to [Instance{roles=[scmserver, cdhclient, scmnode], 
publicIp=184.73.13.6, privateIp=10.82.251.30, id=us-east-1/i-e7999684, nodeMetadata=[id=us-east-1/i-e7999684, providerId=i-e7999684, group=qa_demo_cloudera_cluster, name=null, location=[id=us-east-1d, scope=ZONE, description=us-east-1d, parent=us-east-1, iso3166Codes=[US-VA], metadata={}], uri=null, imageId=us-east-1/ami-ccb35ea5, os=[name=null, family=centos, version=5.4, arch=paravirtual, is64Bit=true, description=rightscale-us-east/CentOS_5.4_x64_v4.4.10.manifest.xml], state=RUNNING, loginPort=22, privateAddresses=[10.82.251.30], publicAddresses=[184.73.13.6], hardware=[id=m1.large, providerId=m1.large, name=null, processors=[[cores=2.0, speed=2.0]], ram=7680, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false]], supportsImage=is64Bit()], loginUser=root, userMetadata={}]}] on ports [7180] for [207.170.241.2/32]

(That exception is possibly a red herring.)

I've found that whirr is very sensitive to how I call it (calling it under sudo/su and with output redirection leads to hangs sometimes), so I'm never quite sure if a hang is due to a problem running whirr or a problem with how I called whirr.

Has anyone tried and succeeded or failed to set up a cluster with a config like this?

If "1 scmserver+scmnode+cdhclient" causes a conflict, I can also try "1 scmserver+scmnode, 1 cdhclient" to get half the benefit....

misterbeebee commented 12 years ago

"1 scmserver+scmnode" fails similarly.

I also noticed an error message caused by useradd being called twice (once per service, I presume), but I suspect that's not fatal.

If anyone has advice on how to trap the hang and see exactly which step is stuck, I'd be happy to investigate further.

I guess https://issues.apache.org/jira/browse/WHIRR-397 (support for 32-bit images in whirr) is a better solution than cramming multiple services on one node.