kiwenlau / hadoop-cluster-docker

Run Hadoop Custer within Docker Containers
Apache License 2.0
1.8k stars 863 forks source link

multi-host hadoop cluster #20

Open kartikjoshi07 opened 8 years ago

kartikjoshi07 commented 8 years ago

This project holds good only for single host. How to use it on multi-host ? i have been trying different implementations provided on web, but nothing is straight forward.

kiwenlau commented 8 years ago

My project is designed for single node, but it can be changed to multiple hosts:

  1. Run Hadoop Container with "--net=host", so that containers will share the IP address with the host node
  2. Use IP address for Hadoop communication.
kartikjoshi07 commented 8 years ago

Thanks Kai. But the problem with this is, i cant ssh from hduser of a container in host1 to hduser of a container in host2, where hduser is a user created inside a container and the containers are set as --net=host. As container and host are sharing the IP, it cannot differentiate whether the user we are trying to ssh belongs to host or a container.

kartikjoshi07 commented 8 years ago

Even we can't ssh from one container to another, as the host and the container are sharing the ip.

kiwenlau commented 8 years ago

Sorry, I didn't expect this problem.

This problem from stackoverflow maybe helpful for you:

SSH into a docker container from another container on a different host

nicornk commented 8 years ago

You could use the new swarm features build into docker 1.12-rc1 "Multi-host networking: You can specify an overlay network for your services. The swarm manager automatically assigns addresses to the containers on the overlay network when it initializes or updates the application." Or you could use docker swarm which also allows you to create a software defined network.