johnsonfoo / terraform-ansible-hadoop-starter

Learning about using Terraform and Ansible to setup a Hadoop cluster on AWS
0 stars 0 forks source link

Yarn and MapReduce memory configuration on t2.micro explained #1

Open johnsonfoo opened 2 years ago

johnsonfoo commented 2 years ago

Error on Hadoop-Worker

Given that only yarn.nodemanager.resource.detect-hardware-capabilities=true is provided and the other settings for memory in yarn-site.xml follows default values. We will receive the following error on Hadoop-Worker when trying to startup Yarn cluster on Hadoop-Master with start-yarn.sh.

image

In the picture, yarn.nodemanager.resource.memory-mb is set to 399MB automatically because yarn.nodemanager.resource.detect-hardware-capabilities=true. It means that the total amount of physical memory (MB) on the Hadoop-Worker that can be allocated for containers is 399MB.

However, yarn.scheduler.minimum-allocation-mb (the minimum allocation for every container request) defaults to 1024MB. This causes the error in picture above. Explanation is from link below.

If the Container memory request minimum (yarn.scheduler.minimum-allocation-mb) is larger than the memory available per node (yarn.nodemanager.resource.memory-mb), then it would be impossible for YARN to fulfill that request. A similar argument can be made for the Container vcore request minimum.

Ref link

Checking physical memory on Hadoop-Worker

image

However, when investigating the Hadoop-Worker, we can see that we can allocate more than 399MB because yarn.nodemanager.resource.detect-hardware-capabilities=true doesn't actually allocate all remaining unused memory to Yarn containers. It is limited by the following configuration.

name: yarn.nodemanager.resource.system-reserved-memory-mb value: -1 description: Amount of physical memory, in MB, that is reserved for non-YARN processes. This configuration is only used if yarn.nodemanager.resource.detect-hardware-capabilities is set to true and yarn.nodemanager.resource.memory-mb is -1. If set to -1, this amount is calculated as 20% of (system memory - 2*HADOOP_HEAPSIZE)

Ref link

johnsonfoo commented 2 years ago

The case study below is for a Hadoop cluster with 1 t2.micro (1 vCPU & 1 GiB RAM) worker.

Based on the guide by Hortonworks for configuring Yarn and MapReduce settings link, the Settings V1 as as follows.

# Settings V1

# The amount of memory in container for map task
mapreduce.map.memory.mb=256
# The amount of memory to set aside as JVM heap size in container for map task
mapreduce.map.java.opts=-Xmx204m
# The amount of memory in container for reduce task
mapreduce.reduce.memory.mb=512
# The amount of memory to set aside as JVM heap size in container for reduce task
mapreduce.reduce.java.opts=-Xmx409m
# The amount of memory in container with MapReduce ApplicationMaster
yarn.app.mapreduce.am.resource.mb=512
# The amount of memory to set aside as JVM heap size in container with MapReduce ApplicationMaster
yarn.app.mapreduce.am.command-opts=-Xmx409m

# Total amount of memory available in host that can be allocated for containers
yarn.nodemanager.resource.memory-mb=512
# Minimum amount of memory in container that can be allocated from ResourceManager
yarn.scheduler.minimum-allocation-mb=256
# Maximum amount of memory in container that can be allocated from ResourceManager
yarn.scheduler.maximum-allocation-mb=512

Settings V1 provides an error as follows because after the MapReduce ApplicationMaster (AM) container is allocated, there is no available memory to allocate for map or reduce containers. (512MB available memory - 512MB AM container memory = 0). The job is stuck waiting for more resources to create map and reduce containers.

Screenshot 2022-03-29 at 2 54 11 AM

johnsonfoo commented 2 years ago

Settings V2 changes the memory in container with MapReduce ApplicationMaster.

# Settings V2

# The amount of memory in container with MapReduce ApplicationMaster
- yarn.app.mapreduce.am.resource.mb=512
+ yarn.app.mapreduce.am.resource.mb=256
# The amount of memory to set aside as JVM heap size in container with MapReduce ApplicationMaster
- yarn.app.mapreduce.am.command-opts=-Xmx409m
+ yarn.app.mapreduce.am.command-opts=-Xmx204m

Even though there is available memory (512MB available memory - 256MB AM container memory = 256MB), there is an error as follows. New map containers which require 256MB memory are able to be created but new reduce containers require 512MB memory. The job is stuck waiting for more resources to create reduce containers.

image

johnsonfoo commented 2 years ago

Settings V3 changes the memory in container with MapReduce ApplicationMaster and memory in container for reduce task.

# Settings V3

# The amount of memory in container with MapReduce ApplicationMaster
- yarn.app.mapreduce.am.resource.mb=512
+ yarn.app.mapreduce.am.resource.mb=256
# The amount of memory to set aside as JVM heap size in container with MapReduce ApplicationMaster
- yarn.app.mapreduce.am.command-opts=-Xmx409m
+ yarn.app.mapreduce.am.command-opts=-Xmx204m
# The amount of memory in container for reduce task
- mapreduce.reduce.memory.mb=512
+ mapreduce.reduce.memory.mb=256
# The amount of memory to set aside as JVM heap size in container for reduce task
- mapreduce.reduce.java.opts=-Xmx409m
+ mapreduce.reduce.java.opts=-Xmx204m

Now there is no more error. There is available memory (512MB available memory - 256MB AM container memory = 256MB) left to create map and reduce containers that both have size 256MB. Settings V3 LGTM!

image