vvksh / learning_stuff

2 stars 0 forks source link

look into ray distributed computing framework #12

Open vvksh opened 3 years ago

vvksh commented 3 years ago

came out of RISELab at Berkeley, supposed to be a general purpose distributed computing framework.

arxiv paper here

notes:

"""elative inflexibility of the BSP model, the high per-task overhead, and the lack of an actor abstraction led us to develop a new system"""

fault tolerance helps save money since it allows us to run on cheap resources like spot instances on AWS

extra systems discussed:

Qs:

tasks:

vvksh commented 3 years ago

also this medium blog gives a nice intro starting from hadoop -> spark -> ray; mainly focuses on why spark is not suited for async tasks

there;s also this report by oreily

vvksh commented 3 years ago

this blog shows benchmark comparing Ray vs Spark vs Dask for a given task; might be interesting to replicate it on rpi cluster or even azure.

vvksh commented 3 years ago

look into ray cluster and play with autoscaling: https://docs.ray.io/en/master/cluster/index.html