improved postprocessing scaling

mgiulini commented 5 months ago

You are about to submit a new Pull Request. Before continuing make sure you read the contributing guidelines and that you comply with the following criteria:

[ ] You have sticked to Python. Please talk to us before adding other programming languages to HADDOCK3
[ ] Your PR is about CNS
[ ] Your code is well documented: proper docstrings and explanatory comments for those tricky parts
[ ] You structured the code into small functions as much as possible. You can use classes if there is a (state) purpose
[ ] Your code follows our coding style
[ ] You wrote tests for the new code
[ ] tox tests pass. Run tox command inside the repository folder
[ ] -test.cfg examples execute without errors. Inside examples/ run python run_tests.py -b
[ ] PR does not add any dependencies, unless permission granted by the HADDOCK team
[ ] PR does not break licensing
[ ] Your PR is about writing documentation for already existing code :fire:
[ ] Your PR is about writing tests for already existing code :godmode:

Closes #857 by improving the scaling of the postprocessing analysis.

When caprieval folders are not present in the workflow, the postprocessing analysis uses the current mode and cores in the CAPRI calculations
pre-caprieval model unpacking (and post caprieval model compression) receive the same parameters

PS: the comparison between the results of analysis/topoaa-clustfcc-test.cfg as it is impossible to get full reproducibility in that case

amjjbonvin commented 3 months ago

Isn’t that defined in a default.yaml file?

On the lumi supercomputer I changed that value to the max number of cores per node

mgiulini commented 3 months ago

Isn’t that defined in a default.yaml file? On the lumi supercomputer I changed that value to the max number of cores per node

yes, but at the end of the workflow we call the analysis, which was implemented as a CLI, thus we need to pass those parameters..I can remove the check for a maximum number of cores to allow any number that comes from the workflow

amjjbonvin commented 3 months ago

Changing it in the yaml file did speed up the analysis when tested in lumi - so it seems there is a max defined/used

mgiulini commented 3 months ago

Changing it in the yaml file did speed up the analysis when tested in lumi - so it seems there is a max defined/used

it used to depend on the run before: when caprieval was run within the workflow the scaling was noticeable, otherwise, if no caprieval was run, the postprocessing was launching capri calculations using few cores..with this PR there will not be any difference anymore

haddocking / haddock3

improved postprocessing scaling #874