tonykang22 / study

0 stars 0 forks source link

10. Spark 개요 : How Spark works #170

Open tonykang22 opened 1 year ago

tonykang22 commented 1 year ago

How Spark works

User application create RDDs, transform them, and run actions.
This results in a DAG of operators.
DAG is compiled into stages.
Each stage is executed as a series of Task.
- One Task for each Partition.

DAG Scheduler

General task graphs
Automatic piplelines functions
Data locality aware
Partitioning aware to avoid shuffles

Execution Plan

Stages are sequences of RDDs, that don't have a Shuffle in between.

Stage Execution

Read HDFS split
Apply both the maps
Start partial reduce
Write shuffle data

Create a task for each Partition in the new RDD.
Serialize the Task.
- Read HDFS > Maps > Partial Reduce > Write Shuffle data
Schedule and ship Tasks to Executors.
All happens internally. You don't need to do anything.

Executor, Task

Example
- Executor(JVM process) with 3 Cores
- Executor executes 3 Tasks concurrently
- run parallel

Summary