zjplab / EPFL-Big-Data-Analysis-with-Scala-and-Spark-Notes

Personal notes on this spark course
https://www.coursera.org/learn/scala-spark-big-data
0 stars 0 forks source link

Spark Basics #2

Open zjplab opened 5 years ago

zjplab commented 5 years ago

Why Scala? Why Spark?

image

image

Spark is more:

zjplab commented 5 years ago

Data-Parallel to Distributed Data-Parallel

image

Distributed Data-Parallelism:

Data-Parallel to Distributed Data-Parallel

image

zjplab commented 5 years ago

Latency

Distribution

Distribution introduces important concerns beyond what we had to worry about when dealing with parallelism in the shared memory case:

Latency cannot be masked completely; it will be an important aspect that also impacts the programming model.

Latency

image