Closed benjha closed 4 years ago
I am not sure what this ticket is about. Should we add this information to our documentation? What the answer satisfactory? Was the yml file updated and is it usable?
@mturilli Yes, please, add it to the documentation.
@lee212 if 168 CPUs is equivalent to 1 node, thus smt4 flag is enabled somewhere, right ? I am wondering what about OMP_NUM_THREADS, is it also set?
For example:
https://jsrunvisualizer.olcf.ornl.gov/?s4f1o14n1c1g0r11d1b21l0=
I noted there is a pre_exec label in workflow configuration file to export env. variables
smt4 is default according to here: https://github.com/radical-cybertools/radical.saga/blob/db772b9feac2405de151490d22f8539ee4056ee9/src/radical/saga/adaptors/lsf/lsfjob.py#L38
OMP_NUM_THREADS
is defined by a user in the entk Task level, e.g.: https://github.com/radical-collaboration/MDFF-Error/blob/736ebea7ade65705dc62f7a2bb84ddc49c73bb77/simple_mdff.summit.py#L74
Generally, we see 4
on both SMT
and OMP_NUM_THREADS
.
BTW, thanks for the visualizer link, it definitely helps see how params make changes.
Yes, pre_exec
is a special variable to run a list of commands prior to the main executable, in this case, NAMD for simulation. As you can see, you do export
and cp/mv/cd
as some of the preparation using the pre_exec
.
README has been updated to explain how to specify the number of nodes in the yaml file. see here: https://github.com/radical-collaboration/MDFF-Error#faq This can be included to the main doc later, https://radicalentk.readthedocs.io/
Is this resolved and can be closed?
Yes, thanks
Q:
Does an argument exist to specify the number of nodes in
https://github.com/radical-collaboration/MDFF-Error/blob/master/simple_mdff_cfg.yml
?
A:
It does not have argument for node counts but cpu counts. The file you pointed out contains resource at a task level which is bounded by the resource description for a job scheduler, e.g. slurm and lsf. See the another file below. Both uses CPU counts instead of node counts.
https://github.com/radical-collaboration/MDFF-Error/blob/master/resource_cfg.yml
For example, you place 56 cpus on bridges (56cpus = 2nodes; 28 each) On summit, 168 cpus indicate 1 node (2 sockets 21 cpus 4 hw threads).
For 4,8,16, … we can multiply cpu counts in those files.