rapidsai / raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.
https://docs.rapids.ai/api/raft/stable/
Apache License 2.0
743 stars 189 forks source link

[FEA] RNG device interface #406

Closed MatthiasKohl closed 2 years ago

MatthiasKohl commented 2 years ago

Is your feature request related to a problem? Please describe. I'd like to add a device interface for RNG. Is this something RAFT can support?

Describe the solution you'd like It would be nice to either have a templated structure (templated on the kind of RNG generator from random/detail/rng_impl.cuh) which has methods to generate random numbers.

Alternatively, we could directly expose device functions templated on the kind of RNG generator from random/detail/rng_impl.cuh).

In addition to this, I'd like to add methods of generating random values that are not as easy to get right as one might think at first, in particular bounded integers. See the context in this issue in particular: https://github.com/rapidsai/cugraph/issues/1979

MatthiasKohl commented 2 years ago

I'd like to work on this if you agree that this functionality belongs in RAFT. EDIT: plus, what way of exposing this functionality would you suggest?

MatthiasKohl commented 2 years ago

Tagging @vinaydes since he was interested in fast generation of bounded integers and we have an implementation available.

cjnolet commented 2 years ago

I think this functionality sounds reasonable to place in RAFT and it should be fine to wrap the device function through the public API and hide the implementatation details through the detail namespace. If this function isn't going to be usable at all on the host, we might also want to consider adding a namespace for it like raft::random::device::.

BTW- a random walk primitive itself might also be useful to have in RAFT eventually. We're planning to consolidate and share more structures between RAFT and cugraph (e.g. csr/dcsr/coo, etc...), which might make this easier to do.

MatthiasKohl commented 2 years ago

Thanks for considering it! While working on it, I noticed that what I want is basically almost exactly the raft::random::detail::Generator struct, but as a public API.

So I think it makes sense to simply have a sub-class of raft::random::detail::Generator in raft::random::device namespace (similar to the sub-class of RngImpl). Then we can add a few additional distributions and document the number of draws made on the underlying random generator since the caller will have to update their offset accordingly, but that is on the host side.

As for the random walk primitive, I think this makes sense as well, but I guess it will take a bit longer to make a primitive out of it.

vinaydes commented 2 years ago

I am currently almost done with reimplementing the RNG class for addressing the exact issues mentioned here. New implementation provides device interface, fixes uniform float and double generation issues and also fixes the bounded integer generation issues. I am also adding a new PCG based generator. The only issue is bunch of spills which are causing performance regression. I would create PR as soon as I fix those issues. Will update this issue once I do that.

MatthiasKohl commented 2 years ago

Thanks for the update @vinaydes ! Is there something I could do to help you with the PR?

This is highest priority for me at the moment since it will be required for cugraph - DGL integration for which we want to have a proof of concept ASAP.

vinaydes commented 2 years ago

PR #434 has the RNG changes.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

cjnolet commented 2 years ago

Updating this issue based on more recent conversation. We've converged on a final public API that should be able to allow the rng seed to be exposed through public APIs (and compiled w/ non-cuda-enabled compilers like cython).

From @MatthiasKohl, we will introduce a class called RandomState (which matches Numpy's naming convention):

class RngState {
  protected:
    uint64_t seed;
    GeneratorType gen_type;
};

And then we will flatten the methods in the existing RNG class into separate functions which align w/ the rest of RAFT:

template <GeneratorType gen_type>
class DeviceRandomState {
  using type = gen_type;
  uint64_t seed; ...
};

#define CALL_DEVICE_RNG(state, func, args) {
  switch(state.gen_type) {
    case X:
    DeviceRandomState<X> device_state{state};
    func<X>(device_state, ##args)
    break;
    ...
  }
}

void uniform(raft::handle_t const &handle, raft::random::RandomState const &rng, ...) {
    CALL_DEVICE_RNG(state, template_func, args);
}

As @MatthiasKohl points out, the macro could be a templated function w/ variadic template params for the arguments.

cc @teju85 @akifcorduk @vinaydes

cjnolet commented 2 years ago

I also want to make a note here to request that instead of modifying / removing the old API, we instead add the new API and deprecate the old so that we can minimize the breaking changes.

cc @vinaydes @teju85 @MatthiasKohl

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

MatthiasKohl commented 2 years ago

@cjnolet should I add the "flattened" function to PR #609 , or open a new one?

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

MatthiasKohl commented 2 years ago

This has been solved with #609