Looking for optimization library that runs in parallel

Char-Aznable commented 6 years ago

I have a few projects that use Kokkos' CUDA backend and I'm considering using some optimization library for my projects to solve nonlinear equation and to do optimization for some machine learning task. It seems that ROL or NOX has the functionality I want so does TriKota/Dakota. Can someone give an overview of what distinguish these packages from one another in terms of ease to use, computational performance and parallelization capability. Also, are they Kokkos compatible?

BTW, is there any video or talk slides that summarize or highlight the different packages? I'm totally new to this amazing package collection and I hope to find out more good stuff in it.

@trilinos/rol @trilinos/nox @trilinos/loca @trilinos/trikota

mhoemmen commented 6 years ago

@trilinos/nox @trilinos/rol @rppawlo @dridzal

mhoemmen commented 6 years ago

Dakota does black-box optimization. You can even use it without changing your code, as long as you can script input and have output in the right format. Dakota uses MPI and is looking into threads but doesn't really thread yet.

dridzal commented 6 years ago

@trilinos/rol ROL is best suited for "embedded," i.e. data-structure aware optimization. It lets you work directly with the application memory space. ROL has a generic linear algebra (vector) interface, and so in theory it will run on any platform with any type of parallelism (inter-node + intra-node, for example). Some of ROL's algorithms have already been used with Kokkos; I believe that @etphipp has tried this most recently. There are a few technical issues that prevent some of ROL's algorithms from working with Kokkos. We are working on fixing this, and hope to have all algorithms Kokkos-compatible in the next three to six months. As a side note, ROL has also been used to solve nonlinear equations, i.e., not just optimization problems.

For more info, check out the slides given here:

https://trilinos.org/packages/rol/

While I'm perfectly fine corresponding with a fictional Gundam character, it may be useful to know your identity or at least your project's specific needs to provide better guidance. For instance, you mention machine learning as one of the applications --depending on the size of the problem, you may want to entertain combining on-node and MPI parallelism through Tpetra @trilinos/tpetra , which is interoperable with Kokkos.

rppawlo commented 6 years ago

NOX is used for Newton-based nonlinear solves. If you use the Thyra/Tpetra backend, it will run using Kokkos and gives access to most Trilinos solvers and preconditioners via Stratimikos.

mhoemmen commented 6 years ago

@dridzal The OP is a serious Kokkos user and has been working a lot with Christian Trott. It looks like they want to branch out into the Trilinos world :-) .

mhoemmen commented 6 years ago

@briadam could say more about the status of Trikota, which is a Trilinos interface to Dakota.

Char-Aznable commented 6 years ago

@mhoemmen @dridzal @rppawlo Thank you all for your help. I've been using Kokkos::View to represent array or matrix objects but it sounds like I should use Tpetra down the road. @dridzal I'm mostly interested in doing Bayesian inference via Monte Carlo methods. The ROL slides mention some simulation based optimization. Can you tell me if it's easy to represent probability distribution objects and do some simulation with them in ROL? Do you have documentation for what algorithms in ROL are compatible with Kokkos or not so that I can watch out for them? Thanks again for your help!

dridzal commented 6 years ago

The question about probability distribution objects is best suited for @dpkouri . Many algorithms in ROL are already compatible with Kokkos and Tpetra/Kokkos. Methods requiring elementwise evaluations of certain nonlinear functions, such as logarithms (used in interior-point methods, for example) won't work until we make some changes to the vector model. What kind of optimization problems are you interested in: unconstrained, bound-constrained, equality-constrained, general inequality-constrained? I believe that the first three types of problems can be handled in a Kokkos-compatible way (maybe not with the full menu of methods, but there are definitely methods that will work).

Char-Aznable commented 6 years ago

Mostly I'll be using equality-constrained and general inequality-constrained. And I'm probably going to use a lot of element-wise evaluation of exp and log functions -- is it possible to work around this by doing the element-wise evaluation elsewhere and pass the data back in to ROL by pointer/reference?

dridzal commented 6 years ago

These element-wise operations are member methods of the ROL::Vector class

virtual void applyUnary( const Elementwise::UnaryFunction<Real> &f ) virtual void applyBinary( const Elementwise::BinaryFunction<Real> &f, const Vector &x ) virtual Real reduce( const Elementwise::ReductionOp<Real> &r )

where the function object is always passed in. I've been told that this causes problems for Kokkos and on-node execution in general. Some of ROL's algorithms use these operations, others don't. Right now, this can't be bypassed, as far as I can tell. There are several ways to solve this issue. A simple way would be to offer a whole slew of specific unary and binary functions directly in the Vector interface (log, exp, min, max). This is ugly. Another option may be to use templated non-member functions and to provide specializations. Either approach would require some changes to ROL::Vector, and, correspondingly, to the algorithm implementations.

Including @mhoemmen @crtrott @etphipp on this for potential input. @gregvw and I will start working on this pretty soon.

mhoemmen commented 6 years ago

@dridzal @gregvw If I remember right, UnaryFunction and BinaryFunction have a virtual operator(). I think there is a way to keep that architecture, (mostly) maintain backwards compatibility with existing code that creates subclasses of UnaryFunction etc., and yet get this working on device. It depends on enabling RDC and on doing virtual method dispatch on device, but that works pretty well with Aria. This will look a bit complicated, but the end result will be that users mostly don't need to fuss about the details. Note that functions should not have mutable state. (That's a general thread-parallel correctness issue, not specific to CUDA. This may or may not affect how you've implemented ReductionOp.)

First, mark UnaryFunction::operator() with the KOKKOS_FUNCTION macro. (Do not use KOKKOS_INLINE_FUNCTION.) This is the only change that could affect backwards compatibility for users who write their own subclasses of UnaryFunction, but it will only affect this for CUDA builds. Ditto for BinaryFunction; I'll omit that for what follows.

Second, add a wrapper class that has an std::shared_ptr<const UnaryFunction<Real>> inside. I'll call it UnaryFunctionWrapper<Real> here. It needs a method const UnaryFunction<Real>* get().

Third, add a factory function, templated on UnaryFunctionType and the desired memory space.

template<template <class> UnaryFunctionType, class Real, class MemorySpace>
std::shared_ptr<const UnaryFunctionWrapper<Real>>
wrapUnaryFunction ()
{
  using parent_type = UnaryFunction<Real>;
  using child_type = UnaryFunctionType<Real>;
  using range_type = Kokkos::RangePolicy<typename MemorySpace::execution_space>;
  void* rawPtr = Kokkos::kokkos_malloc<MemorySpace> (sizeof (child_type));
  // Invoke child class constructor on device.
  // This ensures that the vtable lives in device memory.
  Kokkos::parallel_for ("Allocate object on device", range_type (1),
    KOKKOS_LAMBDA (const int /* unused; always 0 */) { 
      new (rawPtr) UnaryFunctionType<Real>; // placement new
    });
  // Create `shared_ptr` with explicit deallocator.
  // Don't keep this thing around past `Kokkos::finalize()`.
  // If that's an issue, you could maintain a list of `weak_ptr`
  // and manage their deallocation with an `atexit` hook.
  std::shared_ptr<child_type> childPtr (rawPtr, [] (child_type* p) {
    Kokkos::parallel_for ("Destroy object on device", range_type (1),
      KOKKOS_LAMBDA (const int) {
        p->~child_type (); // invoke destructor explicitly on device
      });
      Kokkos::fence (); // ensure that the destructor completed
      Kokkos::kokkos_free<MemorySpace> (p); // deallocate the memory
    });
  return std::static_pointer_cast<parent_type> (childPtr);
}

This presumes that UnaryFunctionType's constructor takes no arguments. You can generalize this by adding a parameter pack, though it has to make sense for those argument(s) to live on device. I've also omitted some details that you should add for safety, like a try-catch block to avoid a memory leak in case the parallel kernel launches fail.

Finally, add an overload of applyUnary to ROL::Vector that takes const UnaryFunctionWrapper<Real>&. Actually, you shouldn't overload virtual methods. It would have been better to have made the virtual methods protected -- e.g., applyUnaryImpl, and to have non-virtual public methods that you can overload. If you can redesign ROL::Vector like that, then please do. Then, you can overload applyUnary as mentioned above, and have it dispatch to applyUnaryImpl, like this:

void applyUnary (const UnaryFunctionWrapper<Real>& ufw) {
  this->applyUnaryImpl (ufw.get ());
}

The point of this redesign would be to let algorithms use ROL::Vector without changes, but accept these wrapped device things as well as the usual UnaryFunction<Real> that they currently know how to accept.

How does this sound?

mhoemmen commented 6 years ago

btw, the applyUnaryImpl function would need to invoke the pointer on device, like this:

virtual void applyUnaryImpl (const UnaryFunction<Real>* f, const bool runOnDevice) { 
  const LO numCols = LO (X.getNumVectors ()); // assuming X is the Tpetra::MultiVector
  if (runOnDevice) {
    using range_type = Kokkos::RangePolicy<execution_space, LO>;
    auto X_lcl = X.getLocalView<memory_space> ();
    Kokkos::parallel_for ("applyUnaryImpl", range_type (0, lclNumRows), 
      KOKKOS_LAMBDA (const LO lclRow) {
        for (LO col = 0; col < numCols; ++col) {
          X_lcl(lclRow, col) = (*f) (X_lcl(lclRow, col));
        }
      });
  }
  else {
    using range_type = Kokkos::RangePolicy<Kokkos::DefaultHostExecutionSpace, LO>;
    auto X_lcl = X.getLocalView<Kokkos::HostSpace> ();
    Kokkos::parallel_for ("applyUnaryImpl", range_type (0, lclNumRows), 
      [=] (const LO lclRow) {
        for (LO col = 0; col < numCols; ++col) {
          X_lcl(lclRow, col) = (*f) (X_lcl(lclRow, col));
        }
      });
  }
}

In the runOnDevice is true case, the input pointer is only valid on device, since it was allocated using kokkos_malloc. The case where runOnDevice is false is the backwards compatibility case; it assumes that the input pointer lives in host memory.

dridzal commented 6 years ago

@mhoemmen , thanks! This sounds very appealing (I’ll have to dig a little deeper to make sure I understand the details). What would you recommend if we were to bite the bullet and redesign the elementwise functionality (and change our algorithms accordingly)?

Char-Aznable commented 6 years ago

What is the right way of building Trilinos via cmake? I'm looking at the sampleScripts directory and it seems they are very customized to the authors' original settings. Is the stuff on https://trilinos.org/docs/files/TrilinosBuildReference.html still up to date? I'd greatly appreciate it if you can post the cmake options of building Tpetra and ROL with CUDA and debugging enabled @mhoemmen

mhoemmen commented 6 years ago

What is the right way of building Trilinos via cmake?

Check out this file in the Trilinos repository: doc/build_ref/TrilinosBuildReferenceTemplate.rst. You can also look in the Wiki here on GitHub, or in Kokkos' documentation, for more information.

Best practice is to start with as few CMake options as possible, then slowly add them as needed, using feedback from the build output.

@dridzal can tell you more about what options ROL wants.

dridzal commented 6 years ago

There are no special options for ROL. Currently the only dependence is Teuchos (it will be enabled by default), so whatever options Teuchos requires. Having said that, I would make sure that optional dependencies are disabled (otherwise you'll get most of Trilinos),

-D Trilinos_ENABLE_ROL:BOOL=ON \
-D Trilinos_ENABLE_ALL_OPTIONAL_PACKAGES:BOOL=OFF \

The ROL/Tpetra adapters should be activated if Tpetra is enabled.

Char-Aznable commented 6 years ago

I got #3702 when building with clang + cuda

Char-Aznable commented 6 years ago

Do you have a slack channel for Trilinos that I can join?

gregvw commented 6 years ago

@mhoemmen, out of curiousity, what would be the argument against:

void applyUnary( const std::function<Real(Real)>& );
void applyBinary( const std::function<Real(Real,Real)>&, const Vector<Real>& );
Real reduce( const std::function<Real(Real,Real)>& ) const;

Edit: As far as I am aware, although the elementwise functions are defined with virtual inheritance, elementwise objects are only ever used within a function or class scope. The type of elementwise function being used in any given algorithm is always known at compile time.

mhoemmen commented 6 years ago

@gregvw I like std::function and wish we could use it generally, but CUDA doesn't let you make an std::function (or whatever CUDA calls the wrapped thing, I think nvstd::function) on host and use it on device. However, if you made applyUnaryImpl the virtual method and made applyUnary nonvirtual, you could actually overload applyUnary to take an std::function. It would then run on host.

Char-Aznable commented 6 years ago

@gregvw why not just take one step forward and template these functions so that they can take lambda for the unary or binary functions? For example:

template < class UnaryFunction >
void applyUnary( UnaryFunction&& );

Speaking as someone who is not familiar with the package, I don't see why one would need virtual method at all. Why not use CRTP instead?

gregvw commented 6 years ago

@Char-Aznable Unfortunately the virtual inheritance design of ROL::Vector preclude templated member functions. I have written CRTP-based implementations of a number of ROL's core components and would definitely prefer this method in the long term, but it would present a fairly significant refactoring in the near term.

That being said, if anyone would like to help me implement static polymorphism in ROL, I'm certainly interested.

dridzal commented 6 years ago

@Char-Aznable @gregvw @mhoemmen Let's assume we are willing to change ROL's algorithms a bit. Can't we keep the core linear algebra (dot, plus, scale) as-is, i.e., using inheritance, and then provide the elementwise functionality through non-member templated functions, like @Char-Aznable suggested? If this is possible, it would make sense for ROL for three reasons: 1) Many ROL algorithms don't require elementwise. 2) Elementwise has been an "optional" part of the linear algebra interface from the get-go. 3) Mathematically speaking, these operations don't really belong in a vector space, and, in my opinion, should be considered an add-on.

gregvw commented 6 years ago

@mhoemmen That's a bit frustrating regarding std::function. It looks like we are back where we started when we first wanted to implement these functions. Is the void* cast a situation one that is frequently encountered with GPU computing? It's not something I'd normally choose to do. I think I speak for the rest of the ROL team that we would strongly favor solutions which are as agnostic as possible at the level of ROL's core functionality. e.g. having Kokkos macros showing up in the interface for the elementwise call operators is not super appealing. Package or hardware-dependent code would ideally be relegated to ROL's adapters.

dridzal commented 6 years ago

and then provide the elementwise functionality through non-member templated functions, like @Char-Aznable suggested?

To clarify, we would have to take some Vector arguments in these functions, of course. But the key idea is that these are free functions in the ROL namespace.

gregvw commented 6 years ago

Something like this?

template<typename Real, typename UnaryFunction> 
void applyUnary( MyVectorType<Real>& f, const UnaryFunction& );

dridzal commented 6 years ago

Yes, if it'll do the job on the device with Kokkos/CUDA, but I'm honestly not qualified to make this determination due to the lack of GPU experience. @mhoemmen could probably let us know if this is a good idea ... assuming we're willing to make the change in ROL.

gregvw commented 6 years ago

If we can satisfy the requirements with free functions until we have an opportunity to consider more careful refactoring down the road, that would be my preference.

dridzal commented 6 years ago

Agreed. Free functions would take us roughly one week to implement and test. There are only one or two user implementations of ROL::Vector that would be affected, and a handful of vector adapters (Tpetra, Thyra, Eigen, StdVector). @mhoemmen would something like the above illustrated templated free functions be an effective solution?

mhoemmen commented 6 years ago

I'm all for free functions. It doesn't really solve whatever backwards compatibility issues you might have, but then again, Trilinos hasn't had a real release in three years (if you don't count the 12.14 tag), so I think you have to forge your own path there ;-) . Free functions are helpful because you can overload them, but they don't solve the "how do I do run-time polymorphism on device" problem.

@gregvw wrote:

That's a bit frustrating regarding std::function. It looks like we are back where we started when we first wanted to implement these functions.

Sort of. There are other design approaches you could take. The point I'm trying to make with applyUnaryImpl being virtual and applyUnary being nonvirtual, is that you can safely overload nonvirtual functions. This would give you a backwards compatibility path, if that's something you want.

Is the void* cast a situation one that is frequently encountered with GPU computing?

This is a common enough pattern for use of virtual methods on device that I think Kokkos has added some kind of allocate+emplace function to make it easier. Aria uses this pattern, for example.

I didn't actually cast to void* -- kokkos_malloc returns void*. That's not the point, though. The point is that if you want an object with a vtable to live on device, you have to invoke the object's constructor on device. Otherwise, the vtable's pointers are pointers to host functions, not pointers to device functions. If the object lives and dies in a device kernel, then you can just create it (invoke its constructor) in the usual way. The problem is that you want the object to persist across multiple kernels. That means you need to allocate it in device memory resulting from cudaMalloc or cudaMallocManaged. Regular new allocates using the default allocator and constructs all at once, so you need to use placement new, so you can control allocation.

It's not something I'd normally choose to do. I think I speak for the rest of the ROL team that we would strongly favor solutions which are as agnostic as possible at the level of ROL's core functionality....

This "kokkos_malloc + placement new in a parallel_for" pattern is what you would need to do in order to have an object with virtual methods that lives in GPU memory and persists through multiple kernel launches, but that doesn't mean I advocate exposing this to users. That's why I was proposing overloading applyUnary to take a "UnaryFunctionWrapper" that hides this detail from users. Users should not have to write that kind of code; that was code for ROL developers. Anyway, there are other ways to get the desired effect, and free functions are also a good idea.

dridzal commented 6 years ago

First, mark UnaryFunction::operator() with the KOKKOS_FUNCTION macro.

@mhoemmen many thanks for the detailed explanation and guidance above. Could you clarify this: Would the ROL::Vector class have to depend on Kokkos for this? Or is this placed in an adapter, and the ROL::Vector class is "clean" of dependencies (and any mention thereof). We have a milestone for this FY to build without any dependencies. I think this is the major point of @gregvw 's comment

we would strongly favor solutions which are as agnostic as possible at the level of ROL's core functionality

It looks like free functions would provide that path, at the nontrivial expense of breaking backward compatibility (which means that we would have to fix this for our users).

mhoemmen commented 6 years ago

@dridzal wrote

Would the ROL::Vector class have to depend on Kokkos for this?

You could replace that macro with another macro that evaluates to __host__ __device__. That would be a good idea if you didn't want to depend on Kokkos.

Any design that lets you overload for different "function" types would let you support all kinds of functions, including user-provided functions that may or may not run on device. If you can break backwards compatibility to do the right thing, go for it :-) .

gregvw commented 6 years ago

I spent most of yesterday trying to devise some way of implementing the free function without having to change existing code with calls to elementwise functions. Unfortunately, I do not see a way given the constraints of runtime polymophism to preserve backward compatibility.

I can make an elementwise refactor branch and get started. What do we know about dependent codes using elementwise functions?

@mhoemmen are __host__ and __device__ nvcc specific?

Char-Aznable commented 6 years ago

@gregvw __host__ and __device__ are CUDA qualifiers. clang can compile them too.

mhoemmen commented 6 years ago

@gregvw That's why you use a macro ;-) If CUDA is not enabled, you would define the macro to the empty string.

gregvw commented 6 years ago

@mhoemmen Regarding your example above, I'm not clear on where wrapUnaryFunction is defined and where it should be called because it is dependent on Kokkos.

The function applyUnaryImpl takes the runOnDevice argument, but

void Vector<Real>::applyUnary( const UnaryFunctionWrapper<Real>& ufw ) {
  this->applyUnaryImpl( ufw.get() ); // Does not pass a bool
}

Where would it be deduced whether to pass a true or false?

Here is a snippet of how applyUnary is currently used

namespace ROL {
namespace Elementwise {
template<class Real>
class Reciprocal : public UnaryFunction<Real> {
public:
  Real apply( const Real& x ) const {
    return static_cast<Real>(1)/x;:w
  }
}
} // namespace Elementwise

template<class Real>
class LogBarrierObjective : public Objective<Real> {
// Sets g = -1/x elementwise
void gradient( Vector<Real>& g, const Vector<Real>& x, Real& tol ) {
  g.set(x);
  Elementwise::Reciprocal<Real> reciprocal;
  g.applyUnary(reciprocal);
  g.scale(-1.0);
} 
}; 
} // namespace ROL

At what level would the UnaryFunction be wrapped and the resulting wrapped function be passed?

If I understand correctly, you are suggesting marking UnaryFunction like this

#if defined( __CUDACC__ )
  #define COMMON_FUNCTION __host__ __device__
#else 
  #define COMMON_FUNCTION /**/
#endif // defined( __CUDA__ )

template<class Real>
class UnaryFunction {
public:
  COMMON_FUNCTION 
  virtual Real apply( const Real &x ) const = 0;
};

Is this correct?

In response to your earlier question, none of the elementwise functions have mutable state variables.

mhoemmen commented 6 years ago

Regarding your example above, I'm not clear on where wrapUnaryFunction is defined and where it should be called because it is dependent on Kokkos.

I thought about this a bit more and realized that the idea is to simulate constrained multiple dispatch. You have multiple kinds of Vector, and you have multiple kinds of unary (and binary -- I'll omit those again for brevity) functions. Examples of the latter might include:

Subclasses of UnaryFunction<Real>
std::function<Real (Real)>
Anything that implements __device__ Real operator() (Real) const (or __host__ __device__ Real operator() (Real) const)

This is a constrained multiple dispatch problem, because some kinds of Vector (e.g., host-only, or sequential-only) and some kinds of UnaryFunction (e.g., device-only) don't work together. The classic way to handle that in languages without native multiple dispatch, is to have both Vector and UnaryFunction being abstract base classes, and to use the usual Smalltalk-like dynamic_cast-branch pattern to enumerate UnaryFunction subclasses (and error out on unsupported combinations).

This pattern also works in this case. The key is to hide the class with the actual __device__ function inside of a subclass of UnaryFunction<Real>. For example:

// Make Kokkos an _optional_ dependency. Alternately, you 
// could make ROL depend on CUDA directly, and define 
// ROL_HOST_DEVICE_FUNCTION as __host__ __device__.
#ifdef HAVE_ROL_KOKKOS
#define ROL_HOST_DEVICE_FUNCTION KOKKOS_FUNCTION
#else 
#define ROL_HOST_DEVICE_FUNCTION
#endif

// Note how this is NOT a subclass of UnaryFunction<Real>.
template<class Real>
class HostDeviceFunction {
public:
  // Destructors of abstract base classes must be marked virtual,
  // for memory safety of derived classes.
  ROL_HOST_DEVICE_FUNCTION virtual ~HostDeviceFunction() {}
  // Subclasses implement this.
  virtual ROL_HOST_DEVICE_FUNCTION Real operator() (const Real&) const = 0;
};

Now you can make a subclass of UnaryFunction that holds a pointer toHostDeviceFunction` inside. For example:

template<class Real>
class UnaryFunctionWithHostDeviceFunctionInside : public UnaryFunction<Real>
{
public:
  // NOT a device function.  This could also take std::unique_ptr&&, 
  // or any other kind of pointer.  That part of the design is not so 
  // important.  The key is that the pointer points to device memory,
  // allocated using cudaMalloc or cudaMallocManaged.
  UnaryFunctionWithHostDeviceFunctionInside (std::shared_ptr<const HostDeviceFunction<Real>>) = default;
  const HostDeviceFunction<Real>* get () const { return f_.get (); }
private:
  std::shared_ptr<const HostDeviceFunction<Real>> f_;
};

Finally, your dynamic_cast-branch pattern inside Vector's "applyUnaryFunction" virtual method would need to ask the given UnaryFunction<Real> whether it is a UnaryFunctionWithHostDeviceFunctionInside<Real>:

virtual void applyUnaryFunction (const UnaryFunction<Real>& f) const {
  const UnaryFunctionWithHostDeviceFunctionInside<Real>* f1 =
    dynamic_cast<const UnaryFunctionWithHostDeviceFunctionInside<Real>*> (&f);
  if (f1 != nullptr) {
    // ... if the Vector subclass can handle a HostDeviceFunction, 
    // then apply f1->get() to the Vector's entries ...
  }
  else { 
    // ... do whatever it is you do now with UnaryFunction ...
  }
}

mhoemmen commented 6 years ago

btw, the above exercise has two goals:

Make it possible to use ROL without Kokkos
Backwards compatibility

For (1), Epetra and Tpetra do something comparable by wrapping MPI_Comm with Epetra_Comm resp. Teuchos::Comm. This ensures that they can work even if building without MPI.

mhoemmen commented 6 years ago

I didn't talk about the "virtual 'impl' function vs. nonvirtual overloads" pattern in the discussion above. You could use that if you wanted the outward-facing user interface to take different kinds of "things that look like functions," but have a single / few implementation(s) for all of them.

gregvw commented 6 years ago

@mhoemmen Would it be possible to use a Visitor pattern for some proxy for UnaryFunction that has its call method overloaded for vector types? This proxy is called inside DerivedVector::applyUnary method so that no 'dynamic_cast' would be needed to call the correct proxy method overloaded for DerivedVector. I've been playing with this idea, but am not sure about what methods can be virtual and keep nvcc happy or if I am missing something important.

mhoemmen commented 6 years ago

@gregvw wrote:

but am not sure about what methods can be virtual and keep nvcc happy

If the object has any virtual methods and you plan on calling them, then the vtable needs to live on device. That means constructing the object on device. That's why I showed you the "allocate some memory on device and placement new the object in it" pattern above.

Would it be possible to use a Visitor pattern for some proxy for UnaryFunction that has its call method overloaded for vector types?

Do you mean, in the matter of std::visit, with the "overloaded operators for different types" pattern? That's kind of what I meant with "overloading applyUnaryFunction."

Char-Aznable commented 4 years ago

@dridzal @gregvw @etphipp Does ROL work with autodiff using Sacado and Kokkos now? Do you have examples of this? I struggle to find documentation regarding tools in Trilinos in general.

github-actions[bot] commented 3 years ago

This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity. If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE label. If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE. If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.

github-actions[bot] commented 3 years ago

This issue was closed due to inactivity for 395 days.

trilinos / Trilinos

Looking for optimization library that runs in parallel #3669