kiwenlau / hadoop-cluster-docker

Run Hadoop Custer within Docker Containers
Apache License 2.0
1.8k stars 863 forks source link

multi-host hadoop cluster #20

Open kartikjoshi07 opened 8 years ago

kartikjoshi07 commented 8 years ago

This project holds good only for single host. How to use it on multi-host ? i have been trying different implementations provided on web, but nothing is straight forward.

kiwenlau commented 8 years ago

My project is designed for single node, but it can be changed to multiple hosts:

  1. Run Hadoop Container with "--net=host", so that containers will share the IP address with the host node
  2. Use IP address for Hadoop communication.
kartikjoshi07 commented 8 years ago

Thanks Kai. But the problem with this is, i cant ssh from hduser of a container in host1 to hduser of a container in host2, where hduser is a user created inside a container and the containers are set as --net=host. As container and host are sharing the IP, it cannot differentiate whether the user we are trying to ssh belongs to host or a container.

kartikjoshi07 commented 8 years ago

Even we can't ssh from one container to another, as the host and the container are sharing the ip.

kiwenlau commented 8 years ago

Sorry, I didn't expect this problem.

This problem from stackoverflow maybe helpful for you:

SSH into a docker container from another container on a different host

nicornk commented 8 years ago

You could use the new swarm features build into docker 1.12-rc1 "Multi-host networking: You can specify an overlay network for your services. The swarm manager automatically assigns addresses to the containers on the overlay network when it initializes or updates the application." https://docs.docker.com/engine/swarm/ Or you could use docker swarm which also allows you to create a software defined network.