Open rbavery opened 5 years ago
I second this entirely! dask
is very near to HPC. This is also the reason why I included it in hpc-in-a-day. There is serves as an example of more big-data-style APIs. I like it as it puts productivity into the central focus. Where would you suggest this could go inside this repo?
I think the introduction to parallel computing section could replace multiprocessing
with an introduction to dask.delayed
for parallelizing custom workflows and/or Dask Arrays. These HPC lessons could serve as a template: https://github.com/sdsc/sdsc-summer-institute-2019/tree/master/hpc0_python_hpc
While both multiprocessing and dask could get the job done, I think that Dask's performance dashboard is huge for being able to profile bid data workflows.
this and the dask documentation already has some really good examples that could serve as a jumping off point: https://github.com/sdsc/sdsc-summer-institute-2019/tree/master/hpc0_python_hpc
an overview of the "why" of dask: https://notamonadtutorial.com/interview-with-dasks-creator-scale-your-python-from-one-computer-to-a-thousand-b4483376f200