Open mlahir1 opened 1 year ago
This is a current restriction of cupy, which we use to generate random samples: it is not possible to sample without replacement when using non-uniform weighting probabilities. I suspect therefore that the best way to get this if you need it is to ask there. It is possible to work-around this restriction by creating a supplementary array that gives each entry in your series the correct multiplicity and then sampling uniformly without replacement, but that might be a bit fiddly depending on use case.
@wence- Are there any cases where the weights
argument is implemented? Should we update the docs?
Are there any cases where the
weights
argument is implemented?
Sampling with replacement is fine.
When @isVoid wrote this, he and I didn't think too hard about documenting the exact supported cases since there are multiple different possibilities (CPU vs GPU sampling depending on whether we're operating row-wise or column-wise, weights, etc) and in most cases I think we just transparently support whatever the underlying libraries (numpy or cupy) support. In my weakly held opinion it would probably be easier to focus on making these error messages more instructive (e.g. "Random sampling with cupy does not support the input [insert input here]. Either [switch to using numpy for sampling] or [try this other workaround]") rather than trying to keep documentation up to date for each case.
Weighted sampling with cudf series is documented as supported but throws the following error
Repro:
Exception: