scala / scala-library-next

backwards-binary-compatible Scala standard library additions
Apache License 2.0
69 stars 17 forks source link

Add randomElement #124

Open Lasering opened 2 years ago

Lasering commented 2 years ago

Hey, I propose the method randomElement to be added to Iterable. This method returns a random element from the collection.

On Iterable it can be (probably naively) implemented with:

def randomElement: A = {
  val n = util.Random.nextInt(size)
  iterator.drop(n).next
}

On Seq:

def randomElement: A = {
  val n = util.Random.nextInt(size)
  apply(n)
}

A version receiving the Random/seed could also be added:

def randomElement(random: Random): A = ???
def randomElement(seed: Long): A = ???

Or maybe the Random/seed could be an Option (this way there would no overloads):

def randomElement(random: Option[Random] = None): A = ???

I'm certainly not the right person to know which approach is better.

Ichoran commented 2 years ago

This seems like the kind of thing that should go into a random number library, not a collections library. Seems awfully specialized. What if you want to choose k of n things? What if you want k things with replacement?

This seems to me just one of a variety of random sampling tasks that are useful. (Note that "shuffle" is also in this class, and that is provided in the random number library.)

Lasering commented 2 years ago

Given my comments about a version receiving the random/seed I understand your comments. I shouldn't have suggested that. However the simple version is very handy to have, at least these languages think so:

Lasering commented 2 years ago

Adding the method to Random would already be an improvement, although the ergonomics wouldn't be as good as adding it to Iterable.

julian-a-avar-c commented 6 months ago

Adding Python to languages that have this feature in their standard library.

BalmungSan commented 6 months ago

@Ichoran while I agree this should belong to Random rather than the collections framework, especially considering the need for a seed. While shuffle is general enough to represent this behavior, it feels unnecessarily expensive if you either just want a single element or want to keep getting multiple random elements when repetition is allowed. Having said that, I guess the signature must either be Option[A] or throw if the collection is empty.

Ichoran commented 6 months ago

@BalmungSan - I didn't mention shuffle because I thought you should implement selection with shuffle! I just mentioned it because it indicates that Random is a natural home for methods like these.

It's fine to add Random.sample. I was just suggesting it not be quite so specific. For instance, r sample xs might pick one element of xs, but r.sample(xs, 5) might pick five, and r.sample(xs, 5, replace = true) might pick five that could repeat.

BalmungSan commented 6 months ago

@Ichoran ah fair and apologies for the misunderstanding. Yeah, that API feels great, the only missing thing would be the semantics of what happens if the n (the number of requested elements) is bigger than m (the size of the collection). Also, if the 1 case should be an overload with a different return type or just the default for m and get a collection of size 1.

BTW, my point was also that AFAIK this repo was not limited to only the collections framework, but the stdlib in general. And with plans for unfreezing the stdlib for future versions of Scala 3 I guess is great to re-activate the discussions here.

Ichoran commented 6 months ago

Yeah, rather than using a default argument, I'd recommend sample(coll): A but sample(coll, n): Coll[A] would make sense. In the spirit of safe operations by default, if you ask for m > n items, I'd say you would write m = h + k * n where k >= 1 and h < n, and return k shuffles of the original collection plus h randomly selected items. That is, you use everything at random; if you run out, everything is available again and you keep going.

I'll add something like this to my kse3 library, and hopefully will report back if I run into any problems.

Ichoran commented 5 months ago

@BalmungSan - I implemented something for arrays which generalizes fine to IndexedSeq. Then for things that aren't arrays, I implemented Algorithm L which doesn't even need to know how big the target is, but has a somewhat high constant cost (due to exponents/logs being used every step).

It is live in kse3 maths 0.2.13, and I decorated the collections themselves with sample (in addition to putting it in the random number generator I wrote). No major hitches save for the usual ones when you want extension methods on collections (extension methods don't always understand the inheritance hierarchy).

Here's a runnable example (using the variants that take a given random number source): https://scastie.scala-lang.org/dS0La51UQxm9Gu3nRzWjAA

Anyway, no significant problems!

I ended up using rng.sample(n)(coll) for argument order for the method on the random number generator, but coll.sample(n)(rng) for the order on collection extension method.