Parallelizing - Githubissues

Parallelization in ISiCLE can occur in two ways. First, as implemented within a given piece of software, e.g. NWChem, ORCA, xTB. For these cases, we've exposed the processes flag in the relevant function calls. This will get communicated back to the intended tool, whether by their own internal parallelization implementation (true for MOBCAL, rdkit, xTB) or through use of OpenMPI (true for NWChem, ORCA). As a user, you won't see much of a difference as this occurs behind this scenes (note: ORCA support is recent; one must install ORCA manually, as well as the version of OpenMPI it was built against, see here, "Downloads" section).

In the above code snippet, processes=4 is specified, so this will scale up to accordingly. You could certainly push this value higher to reflect the number of cores on your system.

The second type of parallelization is "trivially" or "embarrassingly" parallel. This involves putting multiple instances of a given process across cores (or nodes in the case of HPC or cloud). For example, in the for-loop above, one could instantiate a single-core DFT simulation up to the number of available cores.

For this type of parallelization, we recommend use of HPC or cloud resources, as you'll see diminishing returns on a personal workstation. To manage resources in these instances, we utilize the snakemake and/or nextflow workflow management systems. Example ISiCLE workflows can be found here.

Hope that helps!

pnnl / isicle

Parallelizing #28