kokkos / kokkos

Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction
https://kokkos.org
Other
1.86k stars 420 forks source link

Support reduction functor name tag in the required nested type definition such as value_type #3897

Open Char-Aznable opened 3 years ago

Char-Aznable commented 3 years ago

Per https://github.com/kokkos/kokkos/wiki/ParallelDispatch#733-reductions-with-an-array-of-results and https://github.com/kokkos/kokkos/wiki/ParallelDispatch#75-function-name-tags, it's currently not possible to tag value_type, size_type, has_final and value_count for a reduction functor. For example, one can't have a functor class to work with different value_type definitions even though the reduction operator() share most of the functor's member variables. This leads to a ton of boilerplate to make additional functors. Related to https://github.com/kokkos/kokkos/issues/3651

masterleinad commented 3 years ago

Can you give an example of what you want to achieve?

Char-Aznable commented 3 years ago

For the example on the wiki https://github.com/kokkos/kokkos/wiki/ParallelDispatch#733-reductions-with-an-array-of-results:

    struct ColumnSums {
      // In this case, the reduction result is an array of float.
      // Note the C++ notation for an array typedef.
      typedef float value_type[];

     typedef View<float**>::size_type size_type;

      // Tell Kokkos the result array's number of entries.
      // This must be a public value in the functor.
      size_type value_count;

      View<float**> X_;

      // As with the above examples, you may supply an
      // execution_space typedef. If not supplied, Kokkos
      // will use the default execution space for this functor.

      // Be sure to set value_count in the constructor.
      ColumnSums (const View<float**>& X) :
        value_count (X.extent(1)), // # columns in X
        X_ (X)
      {}

      // value_type here is already a "reference" type,
      // so we don't pass it in by reference here.
      KOKKOS_INLINE_FUNCTION void
      operator() (const size_type i, value_type sum) const {
        // You may find it helpful to put pragmas above this loop
        // to convince the compiler to vectorize it. This is 
        // probably only helpful if the View type has LayoutRight.
        for (size_type j = 0; j < value_count; ++j) {
          sum[j] += X_(i, j);
        }
      }

      // value_type here is already a "reference" type,
      // so we don't pass it in by reference here.
      KOKKOS_INLINE_FUNCTION void
      join (volatile value_type dst,
            const volatile value_type src) const {
        for (size_type j = 0; j < value_count; ++j) {
          dst[j] += src[j];
        }
      }

      KOKKOS_INLINE_FUNCTION void init (value_type sum) const {
        for (size_type j = 0; j < value_count; ++j) {
          sum[j] = 0.0;
        }
      }
    };

Assume ColumnSums is a functor that has many member variables and they are involved in some or all of these cases: 1) value_type is int; 2) value_type is float; 3) value_count is 4; 4) value_count is 5 or the combination of 1), 2) and 3), 4). Does this make sense? @masterleinad

Char-Aznable commented 3 years ago

Actually, a better example would be a functor class that manage a bunch of variables and call different parallel_reduce inside different member functions:

struct MultiFunctor {
  template < Tag > using value_type = typename Tag::value_type;
  template < Tag > using size_type = typename Tag::size_type;
  template < Tag > using has_final  =  typename Tag::has_final;
  template < Tag > static size_type<Tag> value_count = Tag::value_count;

  struct Tag1 { 

    using value_type = int;
    using size_type = int;
    using has_final = std::true_type;
    static constexpr size_type value_count = 3;  };

  void operator()(const Tag1&, const int i, value_type<Tag1>& update) const { ... }
  //similarly definition for join, init and final for Tag1

  void reduceFunction1() {
     parallel_reduce(RangePolicy<Tag1>(0, n), *this);
  }

    struct Tag2 { 

    using value_type = double;
    using size_type = int;
    using has_final = std::false_type;
    static constexpr size_type value_count = 5;  };

  //similarly definition for operator(), join, init for Tag2 (no final here)
  void reduceFunction2() {
     parallel_reduce(RangePolicy<Tag2>(0, n), *this);
  }

  //a bunch of member variables shared by Tag1 and Tag2 functors
};

Currently this is not possible so one has the define the Tag1 and Tag2 functor separately even though they share most of the member varibles, which leads to a lot of boilerplate @masterleinad

crtrott commented 3 years ago

We will probably not support tags for arrays (i.e. value_count etc. )but we should look at issues with the has_final thingy

masterleinad commented 3 years ago

@crtrott wanted me to check that the correct final overload is called in case there are multiple and that final is not called if the tag doesn't match if there is only one. I checked that both cases work correctly. Also has_final has no effect AFAICT.

masterleinad commented 3 years ago

Also, your example should work as

struct MultiFunctorCommon {...};

template <TagType>
struct MultiFunctor;

template <Tag1>
struct MultiFunctor: public MultiFunctorCommon {
  void operator()(const Tag1&, const int i, typename Tag1::value_type& update) const { ... }
};

template <Tag2>
struct MultiFunctor: public MultiFunctorCommon {
  void operator()(const Tag2&, const int i, typename Tag2::value_type& update) const { ... }
};
Char-Aznable commented 3 years ago

@crtrott wanted me to check that the correct final overload is called in case there are multiple and that final is not called if the tag doesn't match if there is only one. I checked that both cases work correctly. Also has_final has no effect AFAICT.

@masterleinad Just want to make sure I understand this, are you saying that has_final is ignored when final() is defined?

Multiple inheritance is indeed a nice workaround but in practice I have to write a most derived class that virtually inherits from all the MultiFunctor parents, which has its own boilerplate to deal with (imagine that my original MultiFunctor class is already part of some inheritance hierarchy). It's also kinda going against the motivation of using functor tags in the first place