If you take the rdmc_cpp.ymmsl file and set micro and macro both to a multiplicity of 8, with 4 threads for macro and 2 for micro, and then run muscle3 resources --cores-per-node 16 --verbose rdmc_cpp.ymmsl rd_implementations.ymmsl, you get:
While we're at it, we should print the core numbers sorted numerically, not alphabetically, and collapse subsequent numbers so you get 0-127 instead of 0,1,2, .... ... .... ... ... , 126, 127 :smile:
If you take the
rdmc_cpp.ymmsl
file and setmicro
andmacro
both to a multiplicity of 8, with 4 threads for macro and 2 for micro, and then runmuscle3 resources --cores-per-node 16 --verbose rdmc_cpp.ymmsl rd_implementations.ymmsl
, you get:It's putting
micro[1]
on the same cores asmacro[0]
, while they can overlap computation and should be on separate cores. Expected output:While we're at it, we should print the core numbers sorted numerically, not alphabetically, and collapse subsequent numbers so you get 0-127 instead of 0,1,2, .... ... .... ... ... , 126, 127 :smile: