plumed / plumed2

Development version of plumed 2
https://www.plumed.org
GNU Lesser General Public License v3.0
323 stars 269 forks source link

Tidying up of modules. #1058

Closed gtribello closed 1 month ago

gtribello commented 2 months ago
Description

In this PR I have moved around some of the C++ files so that each module has a clearer purpose. In the current master there are two modules for which the purpose is not entirely clear:

  1. adjmat - which contains functionality for calculating and manipulating matrices
  2. analysis - which contains a hodgepodge of stuff.

For adjmat I moved the base code for calculating matrices ActionWithMatrix to core. I could then move FunctionOfMatrix to a function and the code for diagonalising, inverting and transposing matrices and so on to a new module called matrixtools. The adjmat module then just contains code for calculating various adjacency matrices.

For analysis, I moved the HISTOGRAM to gridtools, the reweighting stuff to bias and the code for doing averages or collecting data to generic. Functionality for computing landmarks for dimensionality reduction was moved to a new module called landmarks, while stuff for wham was moved to a module called wham. The analysis module was then deleted.

In making these changes I wrote some python scripts to:

  1. Determine the requirements between modules in a way that takes the actions needed by shortcuts into account.
  2. Check that plumed_modules lists all the modules that a regtest needs in order to run.

It may be useful to use these scripts to check that no one is introducing wrong this on Github. Perhaps they could be run on the action that does cppcheck. What do you think @GiovanniBussi?

Target release

I would like my code to appear in release 2.10

Type of contribution
Copyright
Tests
GiovanniBussi commented 2 months ago

Hi @gtribello the reorganization is quite substantial but I guess it only affects people modifying those parts of the code. Do you know if this would add nest problems?

Regarding your scripts:

This would be useful in the codecheck. The script is currently in awk. Is the test straightforward? If you can explain it in words or share the python implementation I can try to add it there if easy. This would avoid to manage how to install further dependencies

This should not go with the main code but rather with the regtests. There's already something testing for actions (in regtests/script/what). Could the test be added there? I feel the current test is not very robust because it uses grep actually. I don't know which is the best way.

A general comment about this PR is that I see now that there is much more inter-relationships between modules now. I mean: before, it was possible to switch off all modules except for core, tools, and other basic stuff (blas, lapack, lepton etc). Is it still possible? If you use ./configure --enable-modules=none what happens? In addition, it was possible to selectively turn on some of these modules (e.g. bias). I now see that bias depends on functions, etc. I guess because of shortcuts. This is only because of shortcuts right?

In general, I think the code would grow better if we can have less interdependencies. What I would like to avoid is that people will start to do dynamic_cast<function::Combine*> in the bias module (for example). This is already forbidden with current master (because the plumedcheck will complain with an error of type include_dots if you include a module that you do not depend on with #include "../function/Combine.h").

Would it be possible to split hard and soft dependencies in some way? Ideally, one could have:

What do you think?

gtribello commented 2 months ago

Hi @GiovanniBussi

I will email the two python scripts to you once I have finished this response. The one for checking the modules works by:

  1. building the list of actions in each module
  2. Reading the actions that are used by each shortcut action from the json syntax file

The dependencies increase because you are using actions from module A in the shortcuts in module B. The symfunc module for instance doesn't need any of the headers in adjmat. It just needs actions from that module to load actions that are in there. I think the separation of these two types of dependency is a good idea. I'm just not sure how to do it. Perhaps if you take a look through my python script we can talk about how this might be done on Wednesday?

For the regtests script I:

  1. build the list of actions in each module that is marked default-off.
  2. Read the files in every regtest and look for the keywords of the actions. I then check that the corresponding module is listed in the plumed_module part of the config file.

On the general comments, if you were to build with --enable-modules=none the following modules would be active:

NAME valtools depends ['core', 'generic']
NAME blas depends []
NAME tools depends ['core', 'lapack', 'lepton', 'small_vector']
NAME core depends ['config', 'tools', 'lepton', 'small_vector', 'blas']
NAME lapack depends ['blas']
NAME asmjit depends []
NAME lepton depends ['asmjit']
NAME small_vector depends []

The square brackets show the modules that the module is dependent upon. The only one that has been added to that list is valtools and you could move that to the default-on modules if you preferred.

I would like to have modules organised in a way that you can explain the purpose of each module in a sentence. I also think that recording the interdependencies that the shortcuts introduce somewhere is important. The current master is a bit of a mess in that regard, which is my fault entirely :-(

gtribello commented 2 months ago

HI @GiovanniBussi

I did a bit of a rework of this branch to take account of your concerns about not creating lots of dependencies between modules into account. In other words, I have changed the Makefiles so that only the hard dependencies are listed within them. The dependencies for all the modules look like this after this change:

Module that are always on
NAME valtools depends ['core']
NAME blas depends []
NAME tools depends ['core', 'lapack', 'lepton', 'small_vector']
NAME core depends ['config', 'tools', 'lepton', 'small_vector', 'blas']
NAME lapack depends ['blas']
NAME asmjit depends []
NAME lepton depends ['asmjit']
NAME small_vector depends []
Modules that are default on
NAME secondarystructure depends ['core', 'tools']
NAME isdb depends ['bias', 'colvar', 'core', 'tools', 'function']
NAME molfile depends []
NAME matrixtools depends ['core', 'tools']
NAME setup depends ['core', 'tools']
NAME cltools depends ['core', 'config', 'tools', 'molfile', 'xdrfile']
NAME xdrfile depends []
NAME function depends ['core', 'tools']
NAME gridtools depends ['core', 'tools', 'function', 'lepton']
NAME refdist depends ['core', 'tools', 'function']
NAME colvar depends ['core', 'tools']
NAME vatom depends ['core', 'tools']
NAME multicolvar depends ['core', 'tools']
NAME generic depends ['core', 'tools', 'xdrfile']
NAME bias depends ['core', 'tools']
Modules that are default off
NAME mapping depends ['core', 'tools', 'colvar']
NAME pamm depends ['core', 'tools', 'multicolvar', 'adjmat']
NAME adjmat depends ['core', 'tools']
NAME ves depends ['bias', 'cltools', 'core', 'tools', 'lepton']
NAME eds depends ['core', 'tools', 'bias']
NAME wham depends ['core', 'tools']
NAME annfunc depends ['core', 'function']
NAME dimred depends ['core', 'tools', 'gridtools', 'mapping']
NAME fourier depends ['core', 'gridtools']
NAME membranefusion depends ['core', 'colvar']
NAME clusters depends ['core', 'tools', 'matrixtools', 'multicolvar']
NAME contour depends ['core', 'tools', 'gridtools']
NAME symfunc depends ['core', 'tools', 'multicolvar', 'function']
NAME crystdistrib depends ['core', 'tools', 'colvar']
NAME drr depends ['core', 'tools', 'bias', 'cltools', 'config']
NAME maze depends ['core', 'tools', 'colvar', 'bias']
NAME fisst depends ['core', 'tools', 'bias']
NAME landmarks depends ['core', 'tools', 'matrixtools']
NAME envsim depends ['core', 'tools', 'multicolvar']
NAME piv depends ['core', 'tools', 'colvar']
NAME logmfd depends ['core', 'tools', 'bias']
NAME volumes depends ['core', 'tools']
NAME sasa depends ['core', 'config']
NAME pytorch depends ['core', 'function']
NAME sprint depends ['core']
NAME s2cm depends ['core', 'tools', 'colvar']
NAME funnel depends ['core', 'tools', 'colvar', 'bias']
NAME opes depends ['bias', 'core', 'tools']

So you can see that there is not the same interdependence that we had in the previous version of the PR.

The remaining problem is that the code is no longer aware of the soft dependencies. In other words, you can still get a crash if you try to run a shortcut action that needs an action from a module that isn't loaded. I am not sure how to add the soft dependencies into the way the modules are turned on or off. If you want a list of the soft dependencies, however, you can get one using the following python code:

import json

f = open("json/syntax.json")
syntax = json.load(f)
f.close()

requires = {}
for key, value in syntax.items() :
    if "module" not in value : continue
    thismodule = value["module"]
    if thismodule not in requires.keys() : requires[thismodule] = set()
    if "needs" in value :
       for req in value["needs"] :
           if syntax[req]["module"]!=thismodule : requires[thismodule].add( syntax[req]["module"] )

for key, value in requires.items() :
    print("SOFT REQUIREMENTS FOR MODULE", key, "ARE", requires[key] )
gtribello commented 2 months ago

incidentally, I would like to back port the changes that I made to the code here to add the module names for each of the actions into the syntax file to v2.9. I think this feature is useful for the nest/tutorials as we can then easily add code to search the lists of eggs/lessons by module name. Would this be OK with you?

GiovanniBussi commented 2 months ago

@gtribello the map action -> module is merged now. It's defined in core/ModuleMap.h

gtribello commented 1 month ago

Hello @GiovanniBussi

Thanks for letting me know about the module map feature. I have now finished replaced the scripting in this branch that found the modules with a call to the moduleMap function. I would thus like to continue with a merge of this branch and the tidy up of modules that it represents, if that is OK.

The branch still does not address the issue with the soft dependencies. There is some progress towards this aim as, with the name of each module in the syntax.json file, you can find the soft dependencies from that file using the following python script:

import json

f = open("../json/syntax.json")
syntax = json.load(f)
f.close()

requires = {}
for key, value in syntax.items() :
    if "module" not in value : continue
    thismodule = value["module"]
    if thismodule not in requires.keys() : requires[thismodule] = set()
    if "needs" in value :
       for req in value["needs"] :
           if syntax[req]["module"]!=thismodule : requires[thismodule].add( syntax[req]["module"] )

for key, value in requires.items() :
    print("SOFT REQUIREMENTS FOR MODULE", key, "ARE", requires[key] )

My idea was to have a file that listed the soft dependencies in the src/lib/maketools directory. This file would be consulted when you enable modules so that if you do:

./configure --enable-modules symfunc

adjmat, which is a soft dependency for symfunc, is automatically enabled.

The file in the src/lib/maketools directory would be generated using a (suitably modified) version of the python script above. This script would also be run during one of the CI jobs where all modules are enabled. This would give us an alert if the soft dependencies go out of date.

That was the rough idea. I am not sure how to implement the details, however, as:

  1. I don't know which CI job to run the comparison of the soft dependency file on.
  2. I don't know how to sort out the automated enabling and disabling of modules that this would require during configure.