plumed / plumed2

Development version of plumed 2
https://www.plumed.org
GNU Lesser General Public License v3.0
357 stars 283 forks source link

C++ wrapper array shapes #1050

Closed GiovanniBussi closed 4 months ago

GiovanniBussi commented 6 months ago

This is to improve checking of array shapes in the C++ cmd interface. I open a pull request in case there is some comment and also as a point to document this new feature.

Background

Currently, the cmd interface in most languages (modern C, C++, FORTRAN, and Python) sends to PLUMED information about the type of pointer that has been passed, so as to trigger an error if you do

Plumed p;
std::size_t a=1000; // should have been int or unsigned
p.cmd("setNatoms",&a);

In addition, FORTRAN and Python interfaces send information about array size and shapes. This was natural to do because, in these languages, multidimensional arrays are either native (FORTRAN) or ubiquitously adopted (numpy arrays for Python). In the C++ interface, I originally made it possible to explicitly pass shape information like this

std::vector<std::array<double,3>> positions(natoms);
p.cmd("setPositions",&positions[0][0],{natoms,3});

This is very inconvenient, since it is very unlikely that users will add the shape information manually to the function call.

The aim of this pull request is to let the compiler guess the proper array shapes when possible, without the need for the user to specify it.

Usage

A full example with the recommended syntax is in this regtest.

In short:

std::vector<std::array<double,3>> positions(natoms);

// Before this PR:
p.cmd("setPositions",&positions[0][0]);
// Still possible but discouraged because there's no shape check
// Now, it can be disabled with a strict check mode (see notes below), which will
// tell PLUMED that, if not provided, shape is assumed to be {1}

// After this PR:
p.cmd("setPositions",positions);
// plumed will be informed that this is an array of double with shape (natoms,3)
// and do the relevant checks

Notice that in this example we can detect the shape because the user is using a std::vector or std::array's. However, many codes have their own dedicated types for 3D vectors (as we use PLMD::Vector). I found a way to enable custom types for fixed-array sizes with the following syntax:

// This is a class defined somewhere in the MD code, which should contain exactly three doubles
class vec3 { /* etc */};

#include "plumed/wrapper/Plumed.h"

// The user should write this template specialization, which makes sure the
// plumed wrappers consider this the same as a double[3]
template<>
struct PLMD::wrapper::is_custom_array<vec3> : std::true_type {
  // only specify the type of the content. the compiler will guess the size
  using value_type = double;
};

std::vector<vec3> positions(natoms);

// This will work correctly now:
p.cmd("setPositions",positions);
// plumed will be informed that this is an array of double with shape (natoms,3)

If positions are stored as completely flat arrays of doubles, it is necessary to pass shape information by hand:

std::vector<double> positions(natoms*3);

// this is what was done before, and will still work, unless one enforces strict checking
// p.cmd("setPositions",&positions[0]);

// this will result in an error, because shape is detected as (natoms*3), and not (natoms,3):
// p.cmd("setPositions",positions);

// this will work:
p.cmd("setPositions",positions.data(),{natoms,3});

Shape (or size) will also be detected for string literals and for strings:

// this will pass an array with 8 elements (includes null terminator!)
p.cmd("setMDEnergyUnits","GROMACS");

// this will also pass an array with 8 elements (includes null terminator!)
std::string name="GROMACS";
p.cmd("setMDEnergyUnits",name);

// this instead will pass a pointer without size information:
const char* name="GROMACS";
p.cmd("setMDEnergyUnits",name);

I would like to add a test within PLUMED when dealing with null terminated (C) strings, so that:

This will provide a check that the users are properly passing null terminated strings.

Implementation

I managed to implement this using C++11 (no need for C++17 extension), which means that the calling code does not necessarily need to be compiled with -std=c++17. The logic is the following:

Strings are dealt as a special case, since the passed size is equal to str.size()+1, so as to include the terminating null character.

The tricky part was to allow passing C arrays properly detecting their shapes. For instance: double box[3][3] or vec3 box[3]; should be (3,3) and should be valid arguments in p.cmd("setBox",box). The difficulty stems from the fact that when passing an argument as a universal reference (cmd(T&&val)) information about the argument being a C array is lost. I managed to have it working using overloads with both universal and normal references (cmd(T&val)).

I should probably clean up the code which is still quite messy. There is a regtest testing a lot of combination of possible arguments and it seems all cases are dealed with correctly.

Strict mode

It is also possible to use -D__PLUMED_WRAPPER_CXX_DETECT_SHAPES_STRICT=1 to tell PLUMED that any pointer without shape information should be treated as shape (1), which means one should only use it in cases such as cmd("getBias",&bias); In this way, passing a non-checked array will result in a runtime error.

Other stuff

In this PR I also disabled the possibility to switch off type checks at compilation time. I think we never needed in practice. It is still possible to switch them off at runtime with export PLUMED_TYPESAFE_IGNORE=yes. This was useful in debugging problems.

@Iximiel I am quite new to this way of using C++, so maybe you can double check if the code makes sense. I tried to keep it C++11, so that people including the wrapper/Plumed.h do not need to necessarily switch on C++17, but it fails with some old compilers.

Target release

I would like my code to appear in release v2.10

codecov-commenter commented 6 months ago

Codecov Report

Attention: Patch coverage is 90.42553% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 83.28%. Comparing base (f74c3f9) to head (66fa9dd). Report is 6 commits behind head on master.

:exclamation: Current head 66fa9dd differs from pull request most recent head c7ebe02. Consider uploading reports for the commit c7ebe02 to get more accurate results

Files Patch % Lines
src/cltools/Benchmark.cpp 0.00% 6 Missing :warning:
src/colvar/GyrationShortcut.cpp 50.00% 1 Missing :warning:
src/core/ExchangePatterns.cpp 0.00% 1 Missing :warning:
src/core/PlumedMain.cpp 85.71% 1 Missing :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #1050 +/- ## ========================================== - Coverage 83.32% 83.28% -0.05% ========================================== Files 628 628 Lines 59990 60064 +74 ========================================== + Hits 49987 50023 +36 - Misses 10003 10041 +38 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Iximiel commented 6 months ago

I'm adding a few comment, with the review feature, the code in the link where you mentioned me seems fine

The tricky part was to allow passing C arrays properly detecting their shapes. For instance: double box[3][3] or vec3 box[3]; should be (3,3) and should be valid arguments in p.cmd("setBox",box). The difficulty stems from the fact that when passing an argument as a universal reference (cmd(T&&val)) information about the argument being a C array is lost. I managed to have it working using overloads with both universal and normal references (cmd(T&val)).

I have seen in the CCCL code that they use something like:

template <typename T, unsigned N>
return_type functionName(T input[N],...){
loop_unroller<N>(input,...);
}

to determine the size of how much things to do to that small array (double x[] is a sugar for double *x), you call this with

double X[5]
auto t = functionName (X,...);

and it works,and I see that You used this in is_array, so if the c-style array dimensions are defined at compile time I think it may be feasible to do something like that.

I have to check if something like this will work:

  template<typename T, int N, int M>
  void cmd(const char*key,T val[N][M]) {
    cmd_helper_with_shape(key,val,{N,M,0});
  }

or a variant in wich T val[N] is passed and T is a custom array type

GiovanniBussi commented 6 months ago

Thanks!!

I have seen in the CCCL code that they use something like:

template <typename T, unsigned N>
return_type functionName(T input[N],...){
loop_unroller<N>(input,...);
}

I didn't use this approach because in some early attempts I was detecting ambiguous overloads. I thought that if I use everywhere the pattern:

I am sure there would be no ambiguity. In addition, I first wrote it in C++17 (with if constexpr) and then converted it to C++11, so maybe that's the reason for this choice.

to determine the size of how much things to do to that small array (double x[] is a sugar for double *x), you call this with

double X[5]
auto t = functionName (X,...);

and it works,and I see that You used this in is_array, so if the c-style array dimensions are defined at compile time I think it may be feasible to do something like that.

I tried to squeeze the implementation by treating similar cases as much as possible with the same code, and in the end I managed to use the custom_array trick also on normal C arrays. For the custom_array, I think that asking the user to only specify the type and not the size is less error prone, so in the end I am computing everywhere the number of elements of fixed size arrays (= C arrays, std::array, and custom arrays) with sizeof(T)/sizeof(T::value_type). Notice that the user could also use a struct {double x; double y; double z;} (I've seen it in some code I think).

I have to check if something like this will work:

  template<typename T, int N, int M>
  void cmd(const char*key,T val[N][M]) {
    cmd_helper_with_shape(key,val,{N,M,0});
  }

or a variant in wich T val[N] is passed and T is a custom array type

I think this is not necessary, because the call is recursive now. Indeed, you can pass a double a[3][3] and, even if there is no special case for 2D arrays, the code works as expected.

GiovanniBussi commented 6 months ago

I think it may be interesting to see if

vector<size_t> myshape ={3,3};
plumed->cmd("setBox",box,myshape.data());

that should call the cmd of line 3155, give some probems

I tried now, it throws an exception Incorrect number of axis (passed greater than requested). However, I think this is not a well defined result. shape should be a zero-terminated vector, but what's in myshape[2] is random rubbish.

I was adding this possibility just because it is the only simple way to pass a shape in C or in C++ < 11, but I actually assume the proper syntax is with the initializer_list

plumed->cmd("setBox",box,{3,3})
GiovanniBussi commented 4 months ago

I merge this now to avoid conflicts with other parts of the code.

I would like to also implement the shape detection in the PLMD::WithCmd class, so that it can be used in all the internal tools. It should be easier (we can use C++17 within plumed) but I don't have time to do it now so I will open an issue not to forget.