copyright by edX PageLinuxFoundationX: LFS103x Introduction to Apache Hadoop
Knowledge Check
Select the correct answer. HDFS manages all the data as:
A. Key-Value pairs
B. Tables
C. Views
D. Files and directories correct
What is the NameNode service in HDFS responsible for? Select all answers that apply.
A. Sending/receiving data to/from the filesystem client
B. Managing metadata (file permissions, ownership, etc.)
C. Performing data replication
D. Keeping track of block maps for each file in the filesystem
E. Keeping track of health and availability of DataNodes
F. Knowing where to find each DataNode and connecting to it
What is the DataNode service in HDFS responsible for? Select all answers that apply.
A. Sending/receiving data to/from the filesystem client
B. Writing replicas to other DataNodes
C. Maintaining peer-to-peer connections between each other
D. Storing file metadata (file names, permissions, etc)
incorrect
What does each Hadoop cluster have? Select all answers that apply.
A. Not more than one NameNode
B. As many NameNodes as you have spare servers
C. Exactly one or two NameNodes
D. As many DataNode services as there are NameNodes
E. One or two NodeManager services
F. Exactly one or two ResourceManagers
G. As many DataNode services as you have data servers in the cluster
In a Hadoop cluster, each server node is either a __. Select all answers that apply.
A. Master Node
B. Coordinator Node
C. Worker Node
D. Client Node
E. Gateway Node
F. Utility Node
HDFS has a single point of failure that you need to protect. Select the correct answer.
A. True: NameNode is HDFS's single point of failure
B. True: DataNode is HDFS's single point of failure
C. False: HDFS can be configured with High Availability using two NameNodes correct
Which YARN container resource type is the driver for most resource requests? Select the correct answer.
A. Disk
B. Network
C. Memory correct
ApplicationMasters execute on master nodes. True or False?
A. True
B. False correct
What component is responsible for dealing with a container failure? Select the correct answer.
A. NameNode
B. ApplicationMaster correct
C. DataNode
D. ResourceManager
E. NodeManager
Capacity Scheduler queues are aligned with specific Worker Nodes. True or False?
A. True incorrect
B. False
Key Points To Remember
The main ideas we discussed in this chapter are summarized below:
HDFS is a distributed, fault tolerant filesystem service implemented in Java
HDFS breaks files into blocks and replicates them for reliability and processing data locality
The primary components are the master NameNode service and the worker DataNode service
The NameNode is a memory-hungry service
The NameNode automatically takes care of recovering missing and corrupted blocks
Clients interact with the NameNode to get a list, for each block, of DataNodes to read or write data.
YARN is an HDFS-aware resource scheduler service implemented in Java
YARN enables multiple workloads to execute simultaneously in the cluster
The ResourceManager is the master process responsible for fulfilling resource requests and the NodeManager resides on the worker nodes, along with the actual Containers that fulfill job functions
The ApplicationMaster resides within a Container and is the process responsible for running a job (batch or long-lived service) and making appropriate resource requests
The Capacity Scheduler allows for resource sharing that enables SLA-enabled multi-tenancy.
Both YARN and HDFS can be configured as Highly Available services preventing cluster downtime if a single daemon goes down.
Knowledge Check
Select the correct answer. HDFS manages all the data as: A. Key-Value pairs B. Tables C. Views D. Files and directories correct
What is the NameNode service in HDFS responsible for? Select all answers that apply. A. Sending/receiving data to/from the filesystem client B. Managing metadata (file permissions, ownership, etc.) C. Performing data replication D. Keeping track of block maps for each file in the filesystem E. Keeping track of health and availability of DataNodes F. Knowing where to find each DataNode and connecting to it
What is the DataNode service in HDFS responsible for? Select all answers that apply. A. Sending/receiving data to/from the filesystem client B. Writing replicas to other DataNodes C. Maintaining peer-to-peer connections between each other D. Storing file metadata (file names, permissions, etc) incorrect
What does each Hadoop cluster have? Select all answers that apply. A. Not more than one NameNode B. As many NameNodes as you have spare servers C. Exactly one or two NameNodes D. As many DataNode services as there are NameNodes E. One or two NodeManager services F. Exactly one or two ResourceManagers G. As many DataNode services as you have data servers in the cluster
In a Hadoop cluster, each server node is either a __. Select all answers that apply. A. Master Node B. Coordinator Node C. Worker Node D. Client Node E. Gateway Node F. Utility Node
HDFS has a single point of failure that you need to protect. Select the correct answer. A. True: NameNode is HDFS's single point of failure B. True: DataNode is HDFS's single point of failure C. False: HDFS can be configured with High Availability using two NameNodes correct
Which YARN container resource type is the driver for most resource requests? Select the correct answer. A. Disk B. Network C. Memory correct
ApplicationMasters execute on master nodes. True or False? A. True B. False correct
What component is responsible for dealing with a container failure? Select the correct answer. A. NameNode B. ApplicationMaster correct C. DataNode D. ResourceManager E. NodeManager
Capacity Scheduler queues are aligned with specific Worker Nodes. True or False? A. True incorrect B. False
Key Points To Remember
The main ideas we discussed in this chapter are summarized below:
HDFS is a distributed, fault tolerant filesystem service implemented in Java
YARN is an HDFS-aware resource scheduler service implemented in Java
Both YARN and HDFS can be configured as Highly Available services preventing cluster downtime if a single daemon goes down.