issues
search
tonykang22
/
study
0
stars
0
forks
source link
10. Spark 개요 : How Spark works
#170
Open
tonykang22
opened
1 year ago
tonykang22
commented
1 year ago
How Spark works
User application create RDDs, transform them, and run actions.
This results in a DAG of operators.
DAG is compiled into stages.
Each stage is executed as a series of Task.
One Task for each Partition.
DAG Scheduler
General task graphs
Automatic piplelines functions
Data locality aware
Partitioning aware to avoid shuffles
Execution Plan
Stages
are sequences of RDDs, that don't have a Shuffle in between.
Stage Execution
Read HDFS split
Apply both the maps
Start partial reduce
Write shuffle data
Create a task for each Partition in the new RDD.
Serialize the Task.
Read HDFS > Maps > Partial Reduce > Write Shuffle data
Schedule and ship Tasks to Executors.
All happens internally. You don't need to do anything.
Executor, Task
Example
Executor(JVM process) with 3 Cores
Executor executes 3 Tasks concurrently
run parallel
Summary
How Spark works
DAG Scheduler
Execution Plan
Stages
are sequences of RDDs, that don't have a Shuffle in between.Stage Execution
All happens internally. You don't need to do anything.
Executor, Task
Summary