plumed / plumed2

Development version of plumed 2
https://www.plumed.org
GNU Lesser General Public License v3.0
323 stars 269 forks source link

Improved load action #1056

Closed GiovanniBussi closed 2 months ago

GiovanniBussi commented 2 months ago

This is to improve the way the LOAD action behaves in contexts where multiple plumed objects exist. This might be because the user is creating them in Python, or because an MD code uses multithreading to manage multiple replicas, etc. Finally, this addition makes it possible to override internally defined CVs with LOADed ones, and to combine multiple versions of the same CV in the same input file.

This should also address the issue we have in plumed-nest with eggs containing a temporary version (LOADed) that is then merged in master branch (@maxbonomi @gtribello).

Background

LOAD is using dlopen to load a shared library. The static-object constructors for this library then register some new functionality (usually, one or more actions). Since the registers are global, the behavior is intrinsically not thread safe:

  1. Any LOAD action in one thread will affect all threads.
  2. If LOAD is used on a Plumed object, that variable will be visible also in other Plumed objects.

Another limitation of the LOAD command is that, to avoid ambiguities, the registers were designed to disable actions that were loaded twice. This makes it impossible to override a default action with a LOADed one.

This pull request

I here modified the way the ActionRegister works in the following manner:

For instance, if I LOAD a cpp file containing an alternative implementation of DISTANCE, the log file will show something like this:

... etc ...
PLUMED: Loading shared library Distance10.dylib at 0x7ff91acd3700
PLUMED: Here is the new list of available actions
PLUMED:   0x7ff91acd3700:DISTANCE
PLUMED:   ABMD
... etc ...
PLUMED:   DISTANCE
... etc ...

Notice that this allows using the LOAD command multiple times in the plumed.dat file, and not necessarily at the beginning of the file:

d1: DISTANCE ATOMS=1,2 # default implementation
LOAD FILE=alt_distance.cpp
d2: DISTANCA ATOMS=1,2 # alternate implementation

What is not working yet

Target release

I would like my code to appear in release v2.10

Iximiel commented 2 months ago

plumed mklib is not thread-safe. So, LOAD will be thread safe only directly loading a .so/.dylib file. I think this should be fixed in some way, ideally allowing plumed mklib to use a temporary directory. @Iximiel I ask you since you made the last changes, do you think this is feasible?

Using a tmp directory may be an even better solution than the one that we are using right now. The only problem that I see is that we are compiling the same cpp file nthreads times and copying the various result in the same cwd, assuming we maintain the current behavior and that I understood correctly what may happen. If we are threading with omp there should be a way to make only thread 0 launch mklib and make the other threads wait for it to finish the compilation to load the new shared object?

Reading the code, I would call the struct "Item" with a less generic name

GiovanniBussi commented 2 months ago

The only problem that I see is that we are compiling the same cpp file nthreads times and copying the various result in the same cwd, assuming we maintain the current behavior and that I understood correctly what may happen.

Indeed I was thinking at something like use a temporary directory then:

Alternatively:

In the first case the temporary directory is chosen in the mklib script, and it might be difficult to make sure it gets removed. In the second case we can choose the temporary directory from the C++ code and remove it when the file is dlclosed

GiovanniBussi commented 2 months ago

@Iximiel maybe even simpler, we could use temporary files different for each thread, including the final .so file, and then move the .so file in place with 'mv -n'. In this way all threads will load the same .so file, which is also more efficient in case there are many threads

Iximiel commented 2 months ago

@Iximiel maybe even simpler, we could use temporary files different for each thread, including the final .so file, and then move the .so file in place with 'mv -n'. In this way all threads will load the same .so file, which is also more efficient in case there are many threads

That may be simpler, but

In the first case the temporary directory is chosen in the mklib script, and it might be difficult to make sure it gets removed. In the second case we can choose the temporary directory from the C++ code and remove it when the file is dlclosed

The flags should be easy to set up (and bash as a simple built in for one-letter flags), and flags may solve the choise of mv -u/-n. But I do not like the idea of deleting the compiled so.

Maybe could be easier to recommend using a precompiled .so? Or maybe make can help with avoiding the overrides?

Iximiel commented 2 months ago

The flags should be easy to set up (and bash as a simple built in for one-letter flags), and flags may solve the choise of mv -u/-n. I forgot that there are alredy multi-letter options, so adding also getops may be not a good Idea

GiovanniBussi commented 2 months ago

you still do the work nthreads-1 times more than necessary (and if you are compiling with some libraries is quite a task on the CPU)

Correct. The only way to avoid this is to add some synchonisation. I would not bother doing it now, but it can be added later.

and if you are prototyping mv -n won't override the old file with the new one, and in production mv -u would risk to override the .so while it is read by thread that has already finished it (maybe it is solvable with a semaphore/barrier in C++)

Technically I think dlopen would work anyway, because even if you overwrite (in the sense of "move another file to the same path") while dlopen is loading, the previous version of the file will be kept in memory. You can safely dlopen a library and delete the file immediately, indeed. But I am not sure it is a good idea to rely on this.

Regarding the modification time, I think this is already kind of buggy. The way it currently works is:

This means that if you change the .cpp file it will not be compiled again. I have to think about it

GiovanniBussi commented 2 months ago

The way it currently works is:

check if .so is present. If so, load it otherwise, compile and then load.

Sorry I was wrong... Now if you write LOAD FILE=file.cpp it directly compiles it.

GiovanniBussi commented 2 months ago

This is almost done. I have some doubt on the verbosity of the output. Current I made it a bit cleaner as this:

...
PLUMED: Action DISTANCE
PLUMED:   with label d1
PLUMED:   between atoms 1 2
PLUMED:   using periodic boundary conditions
PLUMED: Action LOAD
PLUMED:   with label @2
PLUMED: Executing: env PLUMED_ROOT="/Users/bussi/plumed2/" env PLUMED_VERSION="2.10.0-dev" env PLUMED_HTMLDIR="/Users/bussi/plumed2/" env PLUMED_INCLUDEDIR="/Users/bussi/plumed2//src/include" env PLUMED_PROGRAM_NAME="plumed" env PLUMED_IS_INSTALLED="no" "/Users/bussi/plumed2/"/scripts/mklib.sh ./Distance10.cpp (only on master node)
PLUMED: Loading shared library ././Distance10.dylib at 0x7ff90b9c5550
PLUMED: Here is the list of new actions
PLUMED:
PLUMED: 0x7ff90b9c5550:DISTANCE
PLUMED:
PLUMED: Action DISTANCE
PLUMED:   with label d10
PLUMED:   between atoms 1 2
PLUMED:   using periodic boundary conditions
...
  1. The full compilation command is probably too verbose. Is it useful?
  2. I am using the pointer at which the library is loaded to identify it. Is it a good idea? Would the path be better?
  3. The list of actions now only includes those that are in the loaded file. I think this is better than before
  4. The list also contains the pointer (0x7ff90b9c5550:DISTANCE). Should this be removed? Or replaced with the path (Distance10.dylib) for clarity? (internally I need to use the pointer because it's guaranteed to be unique, even if one overwrites a file when the simulation is running, but the user output might be more intellibile).
maxbonomi commented 2 months ago

On 16 Apr 2024, at 14:46, Giovanni @.***> wrote:

This is almost done. I have some doubt on the verbosity of the output. Current I made it a bit cleaner as this: ... PLUMED: Action DISTANCE PLUMED: with label d1 PLUMED: between atoms 1 2 PLUMED: using periodic boundary conditions PLUMED: Action LOAD PLUMED: with label @2 PLUMED: Executing: env PLUMED_ROOT="/Users/bussi/plumed2/" env PLUMED_VERSION="2.10.0-dev" env PLUMED_HTMLDIR="/Users/bussi/plumed2/" env PLUMED_INCLUDEDIR="/Users/bussi/plumed2//src/include" env PLUMED_PROGRAM_NAME="plumed" env PLUMED_IS_INSTALLED="no" "/Users/bussi/plumed2/"/scripts/mklib.sh ./Distance10.cpp (only on master node) PLUMED: Loading shared library ././Distance10.dylib at 0x7ff90b9c5550 PLUMED: Here is the list of new actions PLUMED: PLUMED: 0x7ff90b9c5550:DISTANCE PLUMED: PLUMED: Action DISTANCE PLUMED: with label d10 PLUMED: between atoms 1 2 PLUMED: using periodic boundary conditions ...

• The full compilation command is probably too verbose. Is it useful?

Yes, I think this is a good idea

• I am using the pointer at which the library is loaded to identify it. Is it a good idea? Would the path be better?

I think the path is a bit more intuitive

• The list of actions now only includes those that are in the loaded file. I think this is better than before

Yes!

• The list also contains the pointer (0x7ff90b9c5550:DISTANCE). Should this be removed? Or replaced with the path (Distance10.dylib) for clarity? (internally I need to use the pointer because it's guaranteed to be unique, even if one overwrites a file when the simulation is running, but the user output might be more intellibile).

I would replace with the path, or leave both, but not only the pointer (it is a bit less clear)

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

Iximiel commented 2 months ago
  1. The full compilation command is probably too verbose. Is it useful?

Verbosity can be left low by default and toned up by an environment variable, it could be useful for debugging or development purposes

  1. I am using the pointer at which the library is loaded to identify it. Is it a good idea? Would the path be better?

I agree with @maxbonomi: the path is far more intuitive for the user, but the pointer acts as a more unique "hash"

  1. The list of actions now only includes those that are in the loaded file. I think this is better than before

Yes!

  1. The list also contains the pointer (0x7ff90b9c5550:DISTANCE). Should this be removed? Or replaced with the path (Distance10.dylib) for clarity? (internally I need to use the pointer because it's guaranteed to be unique, even if one overwrites a file when the simulation is running, but the user output might be more intellibile).

If you can separate the internal view with the "hash"-pointer from the external view for the user, showing the file/path or the file/path+pointer will be clearer

GiovanniBussi commented 2 months ago

I fixed the log. For this input:

d1: DISTANCE ATOMS=1,2
LOAD FILE=./Distance10.so
d10: DISTANCE ATOMS=1,2

It now looks like this:

...
PLUMED: Action DISTANCE
PLUMED:   with label d1
PLUMED:   between atoms 1 2
PLUMED:   using periodic boundary conditions
PLUMED: Action LOAD
PLUMED:   with label @2
PLUMED: Loading shared library ./Distance10.dylib at 0x7ff909602540
PLUMED: Here is the list of new actions
PLUMED:
PLUMED: DISTANCE
PLUMED:
PLUMED: Action DISTANCE
PLUMED:   From library: ./Distance10.dylib
PLUMED:   with label d10
PLUMED:   between atoms 1 2
PLUMED:   using periodic boundary conditions
...

Notice that for variable d10 it is explicitly noted that it comes from a different library. I think this is useful.

maxbonomi commented 2 months ago

Very clear indeed!MaxSent from my iPhoneOn 17 Apr 2024, at 19:51, Giovanni @.***> wrote: I fixed the log. For this input: d1: DISTANCE ATOMS=1,2 LOAD FILE=./Distance10.so d10: DISTANCE ATOMS=1,2

It now looks like this: ... PLUMED: Action DISTANCE PLUMED: with label d1 PLUMED: between atoms 1 2 PLUMED: using periodic boundary conditions PLUMED: Action LOAD PLUMED: with label @2 PLUMED: Loading shared library ./Distance10.dylib at 0x7ff909602540 PLUMED: Here is the list of new actions PLUMED: PLUMED: DISTANCE PLUMED: PLUMED: Action DISTANCE PLUMED: From library: ./Distance10.dylib PLUMED: with label d10 PLUMED: between atoms 1 2 PLUMED: using periodic boundary conditions ...

Notice that for variable d10 it is explicitly noted that it comes from a different library. I think this is useful.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>