tonykang22 / study

0 stars 0 forks source link

01. Spark 개요 #167

Open tonykang22 opened 1 year ago

tonykang22 commented 1 year ago

Spark 개요

Apache Spark

image




Apache Spark란?

image image



특징



RDD(Resilient Distributed Dataset)



RDD : 생성 > 변형 > 연산

image

tonykang22 commented 1 year ago

Spark 개요 (지원 언어, Interactive Shell)

Spark Language Support


val lines = sc.textFile(/logs/*.log)
lines.filter(x => x.contains("ERROR")).count()


JavaRDD<String> lines = sc.textFile("/logs/*.log");
lines.filter(x -> x.contains("ERROR")).count();


lines = sc.textFile(/logs/*.log)
lines.filter(lambda x: "ERROR" in x).count()



Interactive Shell

Interactive Analysis

tonykang22 commented 1 year ago

03. Spark 개요 (Web Notebook, Zepplin/Jupyter/RStudio)

Web Notebook

tonykang22 commented 1 year ago

04. Spark 개요 (Web UI, Driver/Cluster Manager)

Administrative Web UIs



Driver (Spark Application) Web UI



History Server Web UI



Spark Standalone Web UI



Hadoop YARN Web UI

tonykang22 commented 1 year ago

05. Spark 개요 (Spark vs. MapReduce)

Hadoop - on List : Limitations

image



Spark - In Memory + DAG : Solutions?

image



Spark vs Hadoop : SPEED

image



image



image



Spark vs. Hadoop : Ease of Use

image



image