Open Lasering opened 2 years ago
This seems like the kind of thing that should go into a random number library, not a collections library. Seems awfully specialized. What if you want to choose k
of n
things? What if you want k
things with replacement?
This seems to me just one of a variety of random sampling tasks that are useful. (Note that "shuffle" is also in this class, and that is provided in the random number library.)
Given my comments about a version receiving the random/seed I understand your comments. I shouldn't have suggested that. However the simple version is very handy to have, at least these languages think so:
sample
.choose
Adding the method to Random
would already be an improvement, although the ergonomics wouldn't be as good as adding it to Iterable.
Adding Python to languages that have this feature in their standard library.
@Ichoran while I agree this should belong to Random
rather than the collections framework, especially considering the need for a seed.
While shuffle
is general enough to represent this behavior, it feels unnecessarily expensive if you either just want a single element or want to keep getting multiple random elements when repetition is allowed.
Having said that, I guess the signature must either be Option[A]
or throw if the collection is empty.
@BalmungSan - I didn't mention shuffle
because I thought you should implement selection with shuffle! I just mentioned it because it indicates that Random
is a natural home for methods like these.
It's fine to add Random.sample
. I was just suggesting it not be quite so specific. For instance, r sample xs
might pick one element of xs
, but r.sample(xs, 5)
might pick five, and r.sample(xs, 5, replace = true)
might pick five that could repeat.
@Ichoran ah fair and apologies for the misunderstanding.
Yeah, that API feels great, the only missing thing would be the semantics of what happens if the n
(the number of requested elements) is bigger than m
(the size of the collection). Also, if the 1
case should be an overload with a different return type or just the default for m
and get a collection of size 1
.
BTW, my point was also that AFAIK this repo was not limited to only the collections framework, but the stdlib in general. And with plans for unfreezing the stdlib for future versions of Scala 3 I guess is great to re-activate the discussions here.
Yeah, rather than using a default argument, I'd recommend sample(coll): A
but sample(coll, n): Coll[A]
would make sense. In the spirit of safe operations by default, if you ask for m
> n
items, I'd say you would write m
= h
+ k
* n
where k
>= 1 and h
< n
, and return k
shuffles of the original collection plus h
randomly selected items. That is, you use everything at random; if you run out, everything is available again and you keep going.
I'll add something like this to my kse3 library, and hopefully will report back if I run into any problems.
@BalmungSan - I implemented something for arrays which generalizes fine to IndexedSeq
. Then for things that aren't arrays, I implemented Algorithm L which doesn't even need to know how big the target is, but has a somewhat high constant cost (due to exponents/logs being used every step).
It is live in kse3 maths 0.2.13, and I decorated the collections themselves with sample (in addition to putting it in the random number generator I wrote). No major hitches save for the usual ones when you want extension methods on collections (extension methods don't always understand the inheritance hierarchy).
Here's a runnable example (using the variants that take a given
random number source): https://scastie.scala-lang.org/dS0La51UQxm9Gu3nRzWjAA
Anyway, no significant problems!
I ended up using rng.sample(n)(coll)
for argument order for the method on the random number generator, but coll.sample(n)(rng)
for the order on collection extension method.
Hey, I propose the method
randomElement
to be added toIterable
. This method returns a random element from the collection.On
Iterable
it can be (probably naively) implemented with:On
Seq
:A version receiving the Random/seed could also be added:
Or maybe the Random/seed could be an Option (this way there would no overloads):
I'm certainly not the right person to know which approach is better.