robertdfrench / toast

The OLCF Asset Staging Tool
GNU General Public License v2.0
0 stars 0 forks source link

toast

The OLCF Asset Staging Tool. https://github.com/robertdfrench/toast/archive/v0.2.0.tar.gz

Toast can be used to load assets onto node-local filesystems (such as ramdisk or SSD) prior to application launch. This can reduce stress on parallel filesystems, and help speed up the load time of leadership scale applications.

Usage

Toast takes two arguments:

For example, to stage the file libxml_viz.so to ramdisk on 1200 nodes of a modern Cray system:

$ aprun -n 1200 -N 1 ./toast libxml_viz.so /tmp/scratch/libxml_viz.so

What are Assets?

An asset is any small-ish file that an application process (in MPI terms, a "rank") might need in order to boot correctly. This could be a dynamic library, an auxiliary program, or even a collection of Python modules.

Why stage?

Typically it is advantageous to embed assets statically into your application, but this may not always be convenient, or may require changes in your code. For these situations, toast can push assets to each node for you; the application can then load them locally, which saves congestion on the shared filesystem.

How does it work?

Before launching your application, run one rank of toast on each node in your job. Toast will balance itself so that only a few ranks will load your assets from the shared filesystem. Once loaded, toast will broadcast your assets to all the other nodes allocated to your application, persisting them to whatever local path you define.

What about caching?

Caching can occur, for instance, at the operating system within each node, and also at the level of the shared filesystem. Because a file is typically not available in cache until the first time it has been fully delivered, requests that begin before that time may not be able to leverage cached assets. Broadcasting assets within the compute network reduces demand on the storage network.