scala / collection-strawman

Implementation of the new Scala 2.13 Collections
Apache License 2.0
200 stars 72 forks source link

Remove empty parameter list from `iterator()` #520

Closed NthPortal closed 6 years ago

NthPortal commented 6 years ago

There was a discussion on https://gitter.im/scala/collection-strawman about whether or not IterableOnce.iterator()/Iterable.iterator() should have an empty parameter list, or none at all. I will do my best to record the main points of the discussion below.


(mostly quotes from Gitter)

I ask "Why does IterableOnce.iterator() have an empty parameter list, rather than no parameter list?"

@szeiger and @julienrf suggest it is because iterator() is a side-effecting method, in that two returned Iterators have distinct identities, so the method is not referentially transparent.

@martijnhoekstra notes that "If everything that's not referentially transparent is a side effect, obtaining an iterator is a side-effect".

I ask "What about toArray? The Arrays are mutable, so while initially they would be equivalent, you could modify them so they are not".

@szeiger: "Indeed, if you go by referential transparency a lot more methods would need an empty argument list. What about size? When used on a mutable collection it's not referentially transparent. Of course, nothing is. The distinction doesn't make any sense there. So what about List.size? It is referentially transparent but that's not good enough because substituting such calls has huge performance implications. In a pure functional language the compiler can help you with that but Scala can't."

@SethTisue: "My intuition likes the 'initially they would be equivalent' side of your way of phrasing it, @NthPortal. But I'm having trouble backing up my intuition with a convincing argument [...] I think my intuition is something like, require the () if it introduces mutation or side effects into a program that wouldn't otherwise have it. And toArray doesn't do that, because you are free to write code that uses Array but never mutates an array".

@martijnhoekstra: "Neither does iterator() though - provided you never call next() [...] There is a lot less you can do with an iterator without modifying it than you can do with an Array without modifying it though".

@SethTisue: "In Scala through 2.12, .iterator has no parameter lists and the compiler says 'Iterator[...] does not take parameters' if you write .iterator(). So somebody made a different decision on this years ago. Just leaving it like it was doesn't feel terrible to me. Even if you buy the argument that it ought to have the (), I'm not sure it's worth annoying people with the change, especially since (as Stefan has pointed out) the change will become more annoying when future Scala requires the ()".

@Ichoran says that "My intuition is that () as a way to denote referential transparency is not a good idea anyway. Instead, () is useful as a marker for things that observably mutate the object on which the method is called. That is, for some method bar, the behavior of x.bar will be different after x.foo()." @SethTisue likes this intuition.

SethTisue commented 6 years ago

and then @julienrf said "I’m OK to go back to iterator for the sake of backward compatibility."

marcelocenerine commented 6 years ago

This reminds me of this discussion long time ago: https://groups.google.com/forum/#!topic/scala-language/RlV9O1RDmis

odersky commented 6 years ago

The uniform access principle says that () is redundant if the operation could have been implemented as a (mutable) field. A more precise way of saying that is that it's OK to drop the () for member m on value x if

x.m() == x.m()

assuming some single-threadedness guarantees*. So that means length can be written without () for both mutable and immutable collections. currentTimeMillis() must be written with () because its value may change from one call to the next.

Furthermore:

toList, toSeq: No() because of the way equality is defined on collections. But it's iterator(), toArray(), or next() since different calls yield different values.

I think it would be nice if we could enforce that principle with the collections redesign, provided porting efforts are acceptable. For Scala 2.13, is there a cost for changing to iterator()? For Scala 3, we will enforce the right parentheses on the call site. On the other hand, that should be easy to rewrite automatically. The rewrite tool could do that already for Scala 2.13, in order to keep things tidy.

Ichoran commented 6 years ago

But xs.iterator == xs.iterator for every extant collection type. If xs.iterator is on an iterator, it is a reference to itself, and iterators are equal to themselves. If it is on an Iterable, the two copies are independent.

Also, if xs is an iterator, it is not true that xs.toList == xs.toList.

odersky commented 6 years ago

But xs.iterator == xs.iterator for every extant collection type.

Not if we mean == literally:

scala> List(1, 2, 3)
res0: List[Int] = List(1, 2, 3)

scala> res0.iterator == res0.iterator
res1: Boolean = false

Also, if xs is an iterator, it is not true that xs.toList == xs.toList.

That's indeed a problem, but I guess it is part of the understanding that iterators are "use once", so you could argue that it.toList == it.toList does not even make sense. By the same token,

xs.map(f) == xs.map(f)

for every collection xs and every pure function f, but the same is not true for iterators.

NthPortal commented 6 years ago

It is a more general problem with IterableOnce (or IterableOnceOps), that (virtually) all of its operations consume it, while that is not the case for Iterable, which extends it.

The only real way to solve it without breaking the inheritance structure of the collections is to allow overriding the number of empty parameter lists for a method, which seems weird.

Ichoran commented 6 years ago

Oops, I'd forgotten that Iterator uses only reference equality.

I agree with the uniform access principle, but I don't think that equality as a proof of uniform access works properly. Take, for instance,

case class C(i: Int) {
  def bigger: C = new C(i + 1)
}

class D(val i: Int) {
  def bigger(): D = new D(i + 1)
}

These two behave identically with regard to the bigger method. The only difference is that C has value equality and D has reference equality, so that c.bigger == c.bigger but not d.bigger() == d.bigger().

So the equality test seems not to be necessary since you can break it by essentially irrelevant modifications to other parts of the class. For instance, if you add or remove an equals method, are you supposed to refactor all your ()'s?

Furthermore, the return value isn't even the important thing. Take, for instance,

class E[A] {
  val calls = new ArrayBuffer[() => A]
  def add(call: () => A): Unit = { calls += call; () }
  def drop: Unit = {
    if (calls.nonEmpty) calls.remove(calls.length -1)
    ()
  }
}

Now, despite the fact that e.drop == e.drop, from the signature of drop we know that it is pointless unless it is side-effecting, and if the signature changed to return something (e.g. a constant ID value), that wouldn't make it any less side-effecting.

So certainly the equality test isn't sufficient because we need to know whether the class itself has changed or not.

Since equality is neither necessary nor sufficient for determining whether something obeys the replace-me-with-a-field rule, I think we shouldn't really use it. Sometimes it will give the same answer, but it can have errors either way.

Now, I think phrasing the uniform access principle in terms of whether a call is indistinguishable from field access is fine, but that leaves us with a rather unpleasant situation in inheritance hierarchies where some subsets of the hierarchy are immutable. You're basically forced to always use () if you belong to a hierarchy where the method could have returned a copy of something mutable.

NthPortal commented 6 years ago

Can toArray actually have empty parentheses, given that it has an implicit ClassTag parameter?

szeiger commented 6 years ago

It can. Everything looks normal as long as the implicit parameter list stays implicit but you're up for a surprise if you're used to calling with without parentheses and then want to pass in an explicit ClassTag:

scala> implicit def f[T : scala.reflect.ClassTag]() = ()
f: [T]()(implicit evidence$1: scala.reflect.ClassTag[T])Unit

scala> f

scala> f()

scala> f()(implicitly[scala.reflect.ClassTag[Int]])

scala> f(implicitly[scala.reflect.ClassTag[Int]])
                   ^
       error: no arguments allowed for nullary method f: ()(implicit evidence$1: scala.reflect.ClassTag[T])Unit
lrytz commented 6 years ago

It seems there's an agreement to follow @odersky's strategy https://github.com/scala/collection-strawman/issues/520#issuecomment-375586799. So I'm closgin this, created https://github.com/scala/collection-strawman/issues/571 for toArray

NthPortal commented 6 years ago

@lrytz I'm not as convinced of this agreement - I haven't seen anyone address @Ichoran's points

lrytz commented 6 years ago

@NthPortal thanks for speaking up, I was also testing the waters :-) I'll reopen for now.

@Ichoran says

I agree with the uniform access principle, but I don't think that equality as a proof of uniform access works properly

I agree with that. Can we just drop the second part in @odersky's strategy?

The uniform access principle says that () is redundant if the operation could have been implemented as a (mutable) field. A more precise way of saying that is that it's OK to drop the () for member m on value x if

x.m() == x.m()

This would leave us with "if the operation could have been implemented as a (mutable) field".

So this would mean .iterator(), .toArray(), also .toBuffer() because the result is mutable, but .toList.

@Ichoran remarks

that leaves us with a rather unpleasant situation in inheritance hierarchies where some subsets of the hierarchy are immutable. You're basically forced to always use () if you belong to a hierarchy where the method could have returned a copy of something mutable.

I'd like to see an example here.

Ichoran commented 6 years ago

@lrytz - As an example, consider tail(), flatten(), and sum(). None of these can be fields because any of them might return (different instances of) mutable objects. (Admittedly, sum is usually only defined on immutable objects, but there's no reason you can't define Numeric on a mutable one.)

NthPortal commented 6 years ago

@lrytz An example from a very different angle is Factory[Coll[_]]#empty[A], where whether or not empty can be a field depends on what inheritance hierarchy Coll is in, even though the Factorys are the same trait (and are not in different places in an inheritance hierarchy). That is, the existence of inheritance hierarchies in general affects whether or not empty can be a field.

NthPortal commented 6 years ago

To generalize what @Ichoran said, any method on a scala.collection.CollOps[A, CC[_], C] which returns a C or a CC[_] is going to have this problem.

lrytz commented 6 years ago

Thanks for the good examples!

Do we all agree that we don't want to add () to tail/flatten/empty etc?

Can we just say "add () if the method can perform side effects"? That's what @Ichoran seems to suggest (PR description, last paragraph).

Iterator.next(), println(), but .iterator, .tail, .empty, .toArray, .toList. I personally like it to have () as a sign that a side effect is happening. Multi-threading would not matter, no () for methods that only observe (but don't perform) effects. So in principle also .currentTimeMillis (well that's defined in Java anyway - but do we really need an exception for "exception of System interrupts" as Martin mentions above?)

lrytz commented 6 years ago

It leads to the question what is a side effect. Referential transparency is not a useful answer here, as we're working with mutable state. Effects are IO, observable state changs (changes to local / newly allocated state is not an effect, niether is updating a cache). Exceptions are excluded (.head).

SethTisue commented 6 years ago

Can we just say "add () if the method can perform side effects"?

Yes, please. This how I've understood and taught () for more than a decade now, and it's how I see others understanding and teaching it, too.

I dislike iterator() and toArray() — both direct dislike, and dislike of the migration annoyance we will cause people if we do this.

lrytz commented 6 years ago

https://github.com/scala/scala/pull/6620

NthPortal commented 6 years ago

I noticed last night that Factory[Coll]#newBuilder[A] got () as well in the collections redesign as well - is that something that should be reverted as well?

lrytz commented 6 years ago

Searching a bit in the library, there's also

And outside collections

Ichoran commented 6 years ago

Methods on Random change the instance of the class, so they should have () under any scheme we've discussed! Likewise with Builder.result()--the behavior after calling it is undefined (or defined if it's a ReusableBuilder, but you still shouldn't call it again).