kiwenlau / hadoop-cluster-docker

Run Hadoop Custer within Docker Containers
Apache License 2.0
1.8k stars 864 forks source link

Run Hadoop Cluster within Docker Containers

alt tag

3 Nodes Hadoop Cluster

1. pull docker image
sudo docker pull kiwenlau/hadoop:1.0
2. clone github repository
git clone https://github.com/kiwenlau/hadoop-cluster-docker
3. create hadoop network
sudo docker network create --driver=bridge hadoop
4. start container
cd hadoop-cluster-docker
sudo ./start-container.sh

output:

start hadoop-master container...
start hadoop-slave1 container...
start hadoop-slave2 container...
root@hadoop-master:~# 
5. start hadoop
./start-hadoop.sh
6. run wordcount
./run-wordcount.sh

output

input file1.txt:
Hello Hadoop

input file2.txt:
Hello Docker

wordcount output:
Docker    1
Hadoop    1
Hello    2

Arbitrary size Hadoop cluster

1. pull docker images and clone github repository

do 1~3 like section A

2. rebuild docker image
sudo ./resize-cluster.sh 5
3. start container
sudo ./start-container.sh 5
4. run hadoop cluster

do 5~6 like section A