Closed GiovanniBussi closed 2 months ago
plumed mklib
is not thread-safe. So,LOAD
will be thread safe only directly loading a .so/.dylib file. I think this should be fixed in some way, ideally allowingplumed mklib
to use a temporary directory. @Iximiel I ask you since you made the last changes, do you think this is feasible?
Using a tmp directory may be an even better solution than the one that we are using right now.
The only problem that I see is that we are compiling the same cpp file nthreads times and copying the various result in the same cwd, assuming we maintain the current behavior and that I understood correctly what may happen.
If we are threading with omp there should be a way to make only thread 0 launch mklib
and make the other threads wait for it to finish the compilation to load the new shared object?
Reading the code, I would call the struct "Item" with a less generic name
The only problem that I see is that we are compiling the same cpp file nthreads times and copying the various result in the same cwd, assuming we maintain the current behavior and that I understood correctly what may happen.
Indeed I was thinking at something like use a temporary directory then:
plumed mklib --only-tmp
)outfile: filename
. so that we can parse the name within PLUMED and dlopen itAlternatively:
plumed mklib --destdir /path/
)In the first case the temporary directory is chosen in the mklib script, and it might be difficult to make sure it gets removed. In the second case we can choose the temporary directory from the C++ code and remove it when the file is dlclosed
@Iximiel maybe even simpler, we could use temporary files different for each thread, including the final .so file, and then move the .so file in place with 'mv -n'. In this way all threads will load the same .so file, which is also more efficient in case there are many threads
@Iximiel maybe even simpler, we could use temporary files different for each thread, including the final .so file, and then move the .so file in place with 'mv -n'. In this way all threads will load the same .so file, which is also more efficient in case there are many threads
That may be simpler, but
nthreads-1
times more than necessary (and if you are compiling with some libraries is quite a task on the CPU)mv -n
won't override the old file with the new one, and in production mv -u
would risk to override the .so while it is read by thread that has already finished it (maybe it is solvable with a semaphore/barrier in C++)In the first case the temporary directory is chosen in the mklib script, and it might be difficult to make sure it gets removed. In the second case we can choose the temporary directory from the C++ code and remove it when the file is dlclosed
The flags should be easy to set up (and bash as a simple built in for one-letter flags), and flags may solve the choise of mv -u/-n
. But I do not like the idea of deleting the compiled so.
Maybe could be easier to recommend using a precompiled .so?
Or maybe make
can help with avoiding the overrides?
The flags should be easy to set up (and bash as a simple built in for one-letter flags), and flags may solve the choise of
mv -u/-n
. I forgot that there are alredy multi-letter options, so adding alsogetops
may be not a good Idea
you still do the work nthreads-1 times more than necessary (and if you are compiling with some libraries is quite a task on the CPU)
Correct. The only way to avoid this is to add some synchonisation. I would not bother doing it now, but it can be added later.
and if you are prototyping mv -n won't override the old file with the new one, and in production mv -u would risk to override the .so while it is read by thread that has already finished it (maybe it is solvable with a semaphore/barrier in C++)
Technically I think dlopen would work anyway, because even if you overwrite (in the sense of "move another file to the same path") while dlopen is loading, the previous version of the file will be kept in memory. You can safely dlopen a library and delete the file immediately, indeed. But I am not sure it is a good idea to rely on this.
Regarding the modification time, I think this is already kind of buggy. The way it currently works is:
This means that if you change the .cpp file it will not be compiled again. I have to think about it
The way it currently works is:
check if .so is present. If so, load it otherwise, compile and then load.
Sorry I was wrong... Now if you write LOAD FILE=file.cpp
it directly compiles it.
This is almost done. I have some doubt on the verbosity of the output. Current I made it a bit cleaner as this:
...
PLUMED: Action DISTANCE
PLUMED: with label d1
PLUMED: between atoms 1 2
PLUMED: using periodic boundary conditions
PLUMED: Action LOAD
PLUMED: with label @2
PLUMED: Executing: env PLUMED_ROOT="/Users/bussi/plumed2/" env PLUMED_VERSION="2.10.0-dev" env PLUMED_HTMLDIR="/Users/bussi/plumed2/" env PLUMED_INCLUDEDIR="/Users/bussi/plumed2//src/include" env PLUMED_PROGRAM_NAME="plumed" env PLUMED_IS_INSTALLED="no" "/Users/bussi/plumed2/"/scripts/mklib.sh ./Distance10.cpp (only on master node)
PLUMED: Loading shared library ././Distance10.dylib at 0x7ff90b9c5550
PLUMED: Here is the list of new actions
PLUMED:
PLUMED: 0x7ff90b9c5550:DISTANCE
PLUMED:
PLUMED: Action DISTANCE
PLUMED: with label d10
PLUMED: between atoms 1 2
PLUMED: using periodic boundary conditions
...
0x7ff90b9c5550:DISTANCE
). Should this be removed? Or replaced with the path (Distance10.dylib)
for clarity? (internally I need to use the pointer because it's guaranteed to be unique, even if one overwrites a file when the simulation is running, but the user output might be more intellibile).On 16 Apr 2024, at 14:46, Giovanni @.***> wrote:
This is almost done. I have some doubt on the verbosity of the output. Current I made it a bit cleaner as this: ... PLUMED: Action DISTANCE PLUMED: with label d1 PLUMED: between atoms 1 2 PLUMED: using periodic boundary conditions PLUMED: Action LOAD PLUMED: with label @2 PLUMED: Executing: env PLUMED_ROOT="/Users/bussi/plumed2/" env PLUMED_VERSION="2.10.0-dev" env PLUMED_HTMLDIR="/Users/bussi/plumed2/" env PLUMED_INCLUDEDIR="/Users/bussi/plumed2//src/include" env PLUMED_PROGRAM_NAME="plumed" env PLUMED_IS_INSTALLED="no" "/Users/bussi/plumed2/"/scripts/mklib.sh ./Distance10.cpp (only on master node) PLUMED: Loading shared library ././Distance10.dylib at 0x7ff90b9c5550 PLUMED: Here is the list of new actions PLUMED: PLUMED: 0x7ff90b9c5550:DISTANCE PLUMED: PLUMED: Action DISTANCE PLUMED: with label d10 PLUMED: between atoms 1 2 PLUMED: using periodic boundary conditions ...
• The full compilation command is probably too verbose. Is it useful?
Yes, I think this is a good idea
• I am using the pointer at which the library is loaded to identify it. Is it a good idea? Would the path be better?
I think the path is a bit more intuitive
• The list of actions now only includes those that are in the loaded file. I think this is better than before
Yes!
• The list also contains the pointer (0x7ff90b9c5550:DISTANCE). Should this be removed? Or replaced with the path (Distance10.dylib) for clarity? (internally I need to use the pointer because it's guaranteed to be unique, even if one overwrites a file when the simulation is running, but the user output might be more intellibile).
I would replace with the path, or leave both, but not only the pointer (it is a bit less clear)
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
- The full compilation command is probably too verbose. Is it useful?
Verbosity can be left low by default and toned up by an environment variable, it could be useful for debugging or development purposes
- I am using the pointer at which the library is loaded to identify it. Is it a good idea? Would the path be better?
I agree with @maxbonomi: the path is far more intuitive for the user, but the pointer acts as a more unique "hash"
- The list of actions now only includes those that are in the loaded file. I think this is better than before
Yes!
- The list also contains the pointer (0x7ff90b9c5550:DISTANCE). Should this be removed? Or replaced with the path (Distance10.dylib) for clarity? (internally I need to use the pointer because it's guaranteed to be unique, even if one overwrites a file when the simulation is running, but the user output might be more intellibile).
If you can separate the internal view with the "hash"-pointer from the external view for the user, showing the file/path or the file/path+pointer will be clearer
I fixed the log. For this input:
d1: DISTANCE ATOMS=1,2
LOAD FILE=./Distance10.so
d10: DISTANCE ATOMS=1,2
It now looks like this:
...
PLUMED: Action DISTANCE
PLUMED: with label d1
PLUMED: between atoms 1 2
PLUMED: using periodic boundary conditions
PLUMED: Action LOAD
PLUMED: with label @2
PLUMED: Loading shared library ./Distance10.dylib at 0x7ff909602540
PLUMED: Here is the list of new actions
PLUMED:
PLUMED: DISTANCE
PLUMED:
PLUMED: Action DISTANCE
PLUMED: From library: ./Distance10.dylib
PLUMED: with label d10
PLUMED: between atoms 1 2
PLUMED: using periodic boundary conditions
...
Notice that for variable d10
it is explicitly noted that it comes from a different library. I think this is useful.
Very clear indeed!MaxSent from my iPhoneOn 17 Apr 2024, at 19:51, Giovanni @.***> wrote: I fixed the log. For this input: d1: DISTANCE ATOMS=1,2 LOAD FILE=./Distance10.so d10: DISTANCE ATOMS=1,2
It now looks like this: ... PLUMED: Action DISTANCE PLUMED: with label d1 PLUMED: between atoms 1 2 PLUMED: using periodic boundary conditions PLUMED: Action LOAD PLUMED: with label @2 PLUMED: Loading shared library ./Distance10.dylib at 0x7ff909602540 PLUMED: Here is the list of new actions PLUMED: PLUMED: DISTANCE PLUMED: PLUMED: Action DISTANCE PLUMED: From library: ./Distance10.dylib PLUMED: with label d10 PLUMED: between atoms 1 2 PLUMED: using periodic boundary conditions ...
Notice that for variable d10 it is explicitly noted that it comes from a different library. I think this is useful.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
This is to improve the way the LOAD action behaves in contexts where multiple plumed objects exist. This might be because the user is creating them in Python, or because an MD code uses multithreading to manage multiple replicas, etc. Finally, this addition makes it possible to override internally defined CVs with LOADed ones, and to combine multiple versions of the same CV in the same input file.
This should also address the issue we have in plumed-nest with eggs containing a temporary version (LOADed) that is then merged in master branch (@maxbonomi @gtribello).
Background
LOAD
is using dlopen to load a shared library. The static-object constructors for this library then register some new functionality (usually, one or more actions). Since the registers are global, the behavior is intrinsically not thread safe:LOAD
action in one thread will affect all threads.LOAD
is used on a Plumed object, that variable will be visible also in other Plumed objects.Another limitation of the
LOAD
command is that, to avoid ambiguities, the registers were designed to disable actions that were loaded twice. This makes it impossible to override a default action with a LOADed one.This pull request
I here modified the way the ActionRegister works in the following manner:
LOAD
a library, the ActionRegister is put in a special state.LOAD
procedure is completed (and dlopen returns a handle), all actions that are staged will be added to the register with a prefix. The prefix is a text version of the handle returned by dlopen.For instance, if I
LOAD
a cpp file containing an alternative implementation ofDISTANCE
, the log file will show something like this:LOAD
instruction will maintain a list of handles returned by dlopen. Notice that if the same shared library is loaded twice, it will return the same handle.Notice that this allows using the
LOAD
command multiple times in the plumed.dat file, and not necessarily at the beginning of the file:What is not working yet
plumed mklib
is not thread-safe. So,LOAD
will be thread safe only directly loading a .so/.dylib file. I think this should be fixed in some way, ideally allowingplumed mklib
to use a temporary directory. @iximiel I ask you since you made the last changes, do you think this is feasible?~ FIXED[ ] If a loaded library indirectly loads other libraries with self-registering actions, all the actions will be interpreted as being implemented in the first library. In other words, if:
lib1.so
is linked withlib2.so
LOAD FILE=lib1.so
, then actions in both lib1 and lib2 will appear.LOAD FILE=lib2.so
, then no new action will appera.This might become a problem if we try to use
LOAD
to load modules with dependencies, but I guess this is not planned now. For this use case, I would rather switch to loading modules with env vars or by loading all the modules installed in a dedicated directory.Target release
I would like my code to appear in release v2.10