stan-dev / math

The Stan Math Library is a C++ template library for automatic differentiation of any order using forward, reverse, and mixed modes. It includes a range of built-in functions for probabilistic modeling, linear algebra, and equation solving.
https://mc-stan.org
BSD 3-Clause "New" or "Revised" License
749 stars 187 forks source link

Propogating Eigen expression templates with local temporaries #1775

Closed andrjohns closed 4 years ago

andrjohns commented 4 years ago

Description

As first discussed in this comment, there is an issue with matrix functions returning expressions when those expressions use local temporary structures.

A simple example is the following:

#include <Eigen/Core>
#include <iostream>

template<typename T>
decltype(auto) test_fun(const T& x) {
    const Eigen::Ref<const Eigen::VectorXd>& x_ref = x;
    return (x_ref.array() - 5.0).matrix();
}

int main()
{
    Eigen::MatrixXd test_mat(2,2);
    test_mat << 5, 5, 5, 5;

    Eigen::VectorXd test_vec(2);
    test_vec << 5, 5;

    Eigen::VectorXd out = test_fun(test_mat.diagonal()).eval();
    Eigen::VectorXd out2 = test_fun(test_vec).eval();

    std::cout << "Expression:" << std::endl <<
                 out << std::endl << "Vector" <<
                 std::endl << out2 << std::endl;
}

Which returns:

Expression:
-5
 0
Vector
 0
 0

Because the temporary x_ref.array() falls out of scope.

A simple fix that seems to work is to explicitly set the inner stride:

template<typename T>
decltype(auto) test_fun(const T& x) {
    const Eigen::Ref<const Eigen::VectorXd,0,Eigen::InnerStride<> >& x_ref = x;
    return (x_ref.array() - 5.0).matrix();
}

But Eigen's documentation notes that will disable vectorisation.

Current Version:

v3.1.0

bob-carpenter commented 4 years ago

The expression for the negation will also go out of scope.

The proposed fix with strides still has expressions that go out of scope.

andrjohns commented 4 years ago

That's what I thought, but it works: https://godbolt.org/z/oqraZ3

t4c1 commented 4 years ago

Running the "simple fix" on my computer crashes. I also don't see why it should work, so I guess this example just happens to work with particular compiler you used on godbolt.

t4c1 commented 4 years ago

Some more testing suggests we only get wrong result if the expression needs to be evaluated into Ref (maybe this also just randomly works): https://godbolt.org/z/c2nzPn Evaluating into local variable instead of using Ref does not help: https://godbolt.org/z/4q0d0N Evaluating into parameter of the function does help (these objects live until the line in which the function is called finishes executing): https://godbolt.org/z/6wR3RP

Some digging trough Eigen's source confirms what these tests showed: When constructing Eigen expression object, lightweight objects, such as other expressions, are stored by value and heavy objects, such as Matrix or Ref, are stored by reference. That explains why expressions going out of scope do not cause any problems.

t4c1 commented 4 years ago

I found a way to solve this problem. Those "local temporaries" must have their lifetime bound to the returned object. So we need a custom eigen expression that holds some variables and another eigen expression and behaves like that expression.

If we moved these variables when returning the returned expression would be left with dangling references. Instead these temporaries can be allocated on heap so moving is not necessary.

Custom expression code:

template <class ArgType, typename... Ptrs>
class Holder;

namespace Eigen {
namespace internal {
template <class ArgType, typename... Ptrs>
struct traits<Holder<ArgType, Ptrs...>> {
  typedef typename ArgType::StorageKind StorageKind;
  typedef typename traits<ArgType>::XprKind XprKind;
  typedef typename ArgType::StorageIndex StorageIndex;
  typedef typename ArgType::Scalar Scalar;
  enum {
    Flags = ArgType::Flags & RowMajorBit,
    RowsAtCompileTime = ArgType::RowsAtCompileTime,
    ColsAtCompileTime = ArgType::ColsAtCompileTime,
    MaxRowsAtCompileTime = ArgType::MaxRowsAtCompileTime,
    MaxColsAtCompileTime = ArgType::MaxColsAtCompileTime
  };
};
}  // namespace internal
}  // namespace Eigen

template <typename ArgType, typename... Ptrs>
class Holder
    : public Eigen::internal::dense_xpr_base<Holder<ArgType, Ptrs...>>::type {
 public:
  Holder(const ArgType& arg, Ptrs*... pointers)
      : m_arg(arg), m_unique_ptrs(std::unique_ptr<Ptrs>(pointers)...) {}
  typedef typename Eigen::internal::ref_selector<Holder<ArgType, Ptrs...>>::type
      Nested;
  typedef Eigen::Index Index;
  Index rows() const { return m_arg.rows(); }
  Index cols() const { return m_arg.cols(); }
  typedef typename Eigen::internal::ref_selector<ArgType>::type ArgTypeNested;
  ArgTypeNested m_arg;
  std::tuple<std::unique_ptr<Ptrs>...> m_unique_ptrs;
};

namespace Eigen {
namespace internal {
template <typename ArgType, typename... Ptrs>
struct evaluator<Holder<ArgType, Ptrs...>> : evaluator_base<Holder<ArgType>> {
  typedef Holder<ArgType, Ptrs...> XprType;
  typedef typename nested_eval<ArgType, 1>::type ArgTypeNested;
  typedef typename remove_all<ArgTypeNested>::type ArgTypeNestedCleaned;
  typedef typename XprType::CoeffReturnType CoeffReturnType;
  enum {
    CoeffReadCost = evaluator<ArgTypeNestedCleaned>::CoeffReadCost,
    Flags = evaluator<ArgTypeNestedCleaned>::Flags
            & (HereditaryBits | LinearAccessBit | PacketAccessBit),
    Alignment = Unaligned & evaluator<ArgTypeNestedCleaned>::Alignment,
  };

  evaluator<ArgTypeNestedCleaned> m_argImpl;

  evaluator(const XprType& xpr) : m_argImpl(xpr.m_arg) {}

  EIGEN_STRONG_INLINE CoeffReturnType coeff(Index row, Index col) const {
    CoeffReturnType val = m_argImpl.coeff(row, col);
    return val;
  }

  EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const {
    CoeffReturnType val = m_argImpl.coeff(index);
    return val;
  }

  template <int LoadMode, typename PacketType>
  EIGEN_STRONG_INLINE PacketType packet(Index row, Index col) const {
    return m_argImpl.template packet<LoadMode, PacketType>(row, col);
  }

  template <int LoadMode, typename PacketType>
  EIGEN_STRONG_INLINE PacketType packet(Index index) const {
    return m_argImpl.template packet<LoadMode, PacketType>(index);
  }
};
}  // namespace internal
}  // namespace Eigen

template <typename T, typename... Ptrs>
Holder<T, Ptrs...> makeHolder(const T& arg, Ptrs*... pointers) {
  return Holder<T, Ptrs...>(arg, pointers...);
}

New function implementation:

template <typename T>
decltype(auto) test_fun(const T& x) {
  const auto* x_ref = new Eigen::Ref<const Eigen::VectorXd>(x);
  return makeHolder((x_ref->array() - 5.0).matrix(), x_ref);
}

Live example: https://godbolt.org/z/zamRvL

I also asked this on stackoverflow. While I did not get any good answers (one answer is mine), I got some ideas in comments. Namely that we could implement custom functors that represent an operation in Eigen to completely avoid temporaries in functions that work element-wise.

andrjohns commented 4 years ago

This is really nice! I don't completely understand it, but I like it. How does Eigen know when this custom expression code needs to be used? Will this affect all Eigen expressions moving forward?

t4c1 commented 4 years ago

Because we explicitely return it and I really should have put that into the code I posted ... Now it is fixed.

andrjohns commented 4 years ago

Oddly enough, it looks like the new Eigen::Ref<> appears to do the trick without needing the added Holder class: https://godbolt.org/z/Mif36M

t4c1 commented 4 years ago

Yes, obviously, but that leaks memory. Holder only takes care of releasing it when it goes out of scope. All other code in Holder is there just to make it an Eigen expression.

andrjohns commented 4 years ago

Oh right, make sense