Open bkmgit opened 3 years ago
It may not be realistic to assume that most clusters will allow setting up of a webserver for viewing the scheduler.
for sure - if this lesson can be delivered, depends on the site you use. Typically custom ports higher than 1024 are open for traffic on the landing pad (login nodes).
I'm not sure if things changed. When I wrote the dask
part, dask.jobqueue
was an additional package to install. I decided not to use it in order to reduce dependencies. Maybe that needs to be revisited.
ray
is an interesting library. The main consideration for showcasing dask
is to illustrate a paradigm shift that came into the parallel computing community around the 00 years of this century, i.e. the client-server architecture which potentially auto-parallelizes code. And I must admit, this point was often hard to bring across. In general, this lesson bears the potential to fall victim of feature envy, i.e. constantly update the content with new libraries coming up and fashionable. I think, the focus should remain on concepts.
I have used dask-jobqueue a lot and have organised some tutorials on it. To me, it is great way to introduce interactive supercomputing. It really is best used through JupyterHub though, where you have really nice visualisations. This can also be made to work well with remote systems. There are great lessons out there in this respect, but of course they use Jupyter notebooks not a Carpentries template, for example see https://github.com/ExaESM-WP4/workshop-Dask-Jobqueue-cecam-2021-02
There are solutions for that which could still allow us to stick (mostly) to the Carpentries template, https://jekyllnb.readthedocs.io/en/latest/ used within a GitHub Action could do this.
cluster = SLURMCluster(cores=4, processes=1, memory="4GB", walltime="00:10:00")
np.random.seed(2021) da.random.seed(2021)
def inside_circle(total_count, chunk_size=-1): x = da.random.uniform(size=(total_count), chunks=(chunk_size)) y = da.random.uniform(size=(total_count), chunks=(chunk_size)) radii = da.sqrt (xx + yy) filtered = da.where(radii <= 1.0) indices = np.array(filtered[0]) count = len(radii[indices]) return count
def estimate_pi(total_count,chunk_size): count = inside_circle(total_count, chunk_size) return (4.0 * count / total_count )
def main(): parser = argparse.ArgumentParser( description='Estimate Pi using a Monte Carlo method.') parser.add_argument('n_samples', metavar='N', type=int, nargs=1, default=10000, help='number of times to draw a random number') parser.add_argument('chunk_size', metavar='N', type=int, nargs=1, default=1000, help='chunk size') args = parser.parse_args()
n_samples = args.n_samples[0] chunk_size = args.chunk_size[0] client = Client(cluster) my_pi = estimate_pi(n_samples,chunk_size)
print("[dask version] pi is %f from %i samples with %i" % (my_pi, n_samples,chunk_size)) sys.exit(0)
if name=='main': main()