k8sp / hadoop

14 stars 11 forks source link

RM can not find the host of hadoop-slave-xxx #1

Closed Yancey1989 closed 8 years ago

Yancey1989 commented 8 years ago

I'm tring to run the example WordCount, but the app state alwasy ACCEPTED. In RM log:

16/05/08 15:56:36 ERROR scheduler.SchedulerApplicationAttempt: Error trying to assign container token and NM token to an allocated container container_1462716844418_0001_01_000001
java.lang.IllegalArgumentException: java.net.UnknownHostException: hadoop-slave-qjzbu
    at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374)
    at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:256)
    at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:220)
    at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.pullNewlyAllocatedContainersAndNMTokens(SchedulerApplicationAttempt.java:448)
    at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:921)

It looks like RM node can not connect the pod of NM node,such as hadoop-slave-xxxx.

So my question is :

I can use dns service with the component name such as hadoop-master, but when the hadoop-master want to connect the specific pod, such as hadoop-slave-xxx, it can not find the ip.

Yancey1989 commented 8 years ago

I did some trying.

1. Change the hostname

1.1 Idea

Change the hostnmae of NM, make RM connect to NM with the special hostname. the hostname is the service name, RM connect to the NM with the service proxy.

1.2 Problem

1.3 Conclusion

Change the hostname is not a good idea.

2. Change the /etc/hosts of RM

2.1 Idea

10.0.0.24 hadoop-slave-demo-1-xxxx 

10.0.0.24 is the CLUSTER-IP of hadoop-slave hadoop-slave-demo-1-xxxx is the pod name (hostname) of the hadoop-slave

2.2 Problem

2.3. Conclusion

The idea can run, but a bit ugly .

wangkuiyi commented 8 years ago

可不可能写一个bash脚本,把你做过的这些操作复现了。贴在这里,方便帮忙的人一起看看呢?

Yancey1989 commented 8 years ago

提了一个example,用第二种方法解决的。缺点是节点扩容时需要手动修改RM中的hosts信息。