hpc-carpentry / hpc-python

HPC Python lesson materials
https://hpc-carpentry.github.io/hpc-python/
Other
58 stars 50 forks source link

hpc-python lesson should cover Dask #12

Open rbavery opened 5 years ago

rbavery commented 5 years ago

this and the dask documentation already has some really good examples that could serve as a jumping off point: https://github.com/sdsc/sdsc-summer-institute-2019/tree/master/hpc0_python_hpc

an overview of the "why" of dask: https://notamonadtutorial.com/interview-with-dasks-creator-scale-your-python-from-one-computer-to-a-thousand-b4483376f200

psteinb commented 5 years ago

I second this entirely! dask is very near to HPC. This is also the reason why I included it in hpc-in-a-day. There is serves as an example of more big-data-style APIs. I like it as it puts productivity into the central focus. Where would you suggest this could go inside this repo?

rbavery commented 5 years ago

I think the introduction to parallel computing section could replace multiprocessing with an introduction to dask.delayed for parallelizing custom workflows and/or Dask Arrays. These HPC lessons could serve as a template: https://github.com/sdsc/sdsc-summer-institute-2019/tree/master/hpc0_python_hpc

While both multiprocessing and dask could get the job done, I think that Dask's performance dashboard is huge for being able to profile bid data workflows.