Closed NthPortal closed 6 years ago
and then @julienrf said "I’m OK to go back to iterator for the sake of backward compatibility."
This reminds me of this discussion long time ago: https://groups.google.com/forum/#!topic/scala-language/RlV9O1RDmis
The uniform access principle says that ()
is redundant if the operation could have been implemented as a (mutable) field. A more precise way of saying that is that it's OK to drop the ()
for member m
on value x
if
x.m() == x.m()
assuming some single-threadedness guarantees*. So that means length
can be written without ()
for both mutable and immutable collections. currentTimeMillis()
must be written with ()
because its value may change from one call to the next.
Furthermore:
toList
, toSeq
: No()
because of the way equality is defined on collections.
But it's iterator()
, toArray()
, or next()
since different calls yield different values.
I think it would be nice if we could enforce that principle with the collections redesign, provided porting efforts are acceptable. For Scala 2.13, is there a cost for changing to iterator()
? For Scala 3, we will enforce the right parentheses on the call site. On the other hand, that should be easy to rewrite automatically. The rewrite tool could do that already for Scala 2.13, in order to keep things tidy.
()
. For other types, we assume singlethreaded use, with the exception of System interrupts (e.g. currentTimeMillis()
.But xs.iterator == xs.iterator
for every extant collection type. If xs.iterator
is on an iterator, it is a reference to itself, and iterators are equal to themselves. If it is on an Iterable
, the two copies are independent.
Also, if xs
is an iterator, it is not true that xs.toList == xs.toList
.
But xs.iterator == xs.iterator for every extant collection type.
Not if we mean ==
literally:
scala> List(1, 2, 3)
res0: List[Int] = List(1, 2, 3)
scala> res0.iterator == res0.iterator
res1: Boolean = false
Also, if xs is an iterator, it is not true that xs.toList == xs.toList.
That's indeed a problem, but I guess it is part of the understanding that iterators are "use once", so you could argue that it.toList == it.toList
does not even make sense. By the same token,
xs.map(f) == xs.map(f)
for every collection xs
and every pure function f
, but the same is not true for iterators.
It is a more general problem with IterableOnce
(or IterableOnceOps
), that (virtually) all of its operations consume it, while that is not the case for Iterable
, which extends it.
The only real way to solve it without breaking the inheritance structure of the collections is to allow overriding the number of empty parameter lists for a method, which seems weird.
Oops, I'd forgotten that Iterator
uses only reference equality.
I agree with the uniform access principle, but I don't think that equality as a proof of uniform access works properly. Take, for instance,
case class C(i: Int) {
def bigger: C = new C(i + 1)
}
class D(val i: Int) {
def bigger(): D = new D(i + 1)
}
These two behave identically with regard to the bigger
method. The only difference is that C
has value equality and D
has reference equality, so that c.bigger == c.bigger
but not d.bigger() == d.bigger()
.
So the equality test seems not to be necessary since you can break it by essentially irrelevant modifications to other parts of the class. For instance, if you add or remove an equals
method, are you supposed to refactor all your ()
's?
Furthermore, the return value isn't even the important thing. Take, for instance,
class E[A] {
val calls = new ArrayBuffer[() => A]
def add(call: () => A): Unit = { calls += call; () }
def drop: Unit = {
if (calls.nonEmpty) calls.remove(calls.length -1)
()
}
}
Now, despite the fact that e.drop == e.drop
, from the signature of drop
we know that it is pointless unless it is side-effecting, and if the signature changed to return something (e.g. a constant ID value), that wouldn't make it any less side-effecting.
So certainly the equality test isn't sufficient because we need to know whether the class itself has changed or not.
Since equality is neither necessary nor sufficient for determining whether something obeys the replace-me-with-a-field rule, I think we shouldn't really use it. Sometimes it will give the same answer, but it can have errors either way.
Now, I think phrasing the uniform access principle in terms of whether a call is indistinguishable from field access is fine, but that leaves us with a rather unpleasant situation in inheritance hierarchies where some subsets of the hierarchy are immutable. You're basically forced to always use ()
if you belong to a hierarchy where the method could have returned a copy of something mutable.
Can toArray
actually have empty parentheses, given that it has an implicit ClassTag
parameter?
It can. Everything looks normal as long as the implicit parameter list stays implicit but you're up for a surprise if you're used to calling with without parentheses and then want to pass in an explicit ClassTag
:
scala> implicit def f[T : scala.reflect.ClassTag]() = ()
f: [T]()(implicit evidence$1: scala.reflect.ClassTag[T])Unit
scala> f
scala> f()
scala> f()(implicitly[scala.reflect.ClassTag[Int]])
scala> f(implicitly[scala.reflect.ClassTag[Int]])
^
error: no arguments allowed for nullary method f: ()(implicit evidence$1: scala.reflect.ClassTag[T])Unit
It seems there's an agreement to follow @odersky's strategy https://github.com/scala/collection-strawman/issues/520#issuecomment-375586799. So I'm closgin this, created https://github.com/scala/collection-strawman/issues/571 for toArray
@lrytz I'm not as convinced of this agreement - I haven't seen anyone address @Ichoran's points
@NthPortal thanks for speaking up, I was also testing the waters :-) I'll reopen for now.
@Ichoran says
I agree with the uniform access principle, but I don't think that equality as a proof of uniform access works properly
I agree with that. Can we just drop the second part in @odersky's strategy?
The uniform access principle says that
()
is redundant if the operation could have been implemented as a (mutable) field. A more precise way of saying that is that it's OK to drop the()
for member m on value x ifx.m() == x.m()
This would leave us with "if the operation could have been implemented as a (mutable) field".
So this would mean .iterator()
, .toArray()
, also .toBuffer()
because the result is mutable, but .toList
.
@Ichoran remarks
that leaves us with a rather unpleasant situation in inheritance hierarchies where some subsets of the hierarchy are immutable. You're basically forced to always use () if you belong to a hierarchy where the method could have returned a copy of something mutable.
I'd like to see an example here.
@lrytz - As an example, consider tail()
, flatten()
, and sum()
. None of these can be fields because any of them might return (different instances of) mutable objects. (Admittedly, sum
is usually only defined on immutable objects, but there's no reason you can't define Numeric
on a mutable one.)
@lrytz An example from a very different angle is Factory[Coll[_]]#empty[A]
, where whether or not empty
can be a field depends on what inheritance hierarchy Coll
is in, even though the Factory
s are the same trait (and are not in different places in an inheritance hierarchy). That is, the existence of inheritance hierarchies in general affects whether or not empty
can be a field.
To generalize what @Ichoran said, any method on a scala.collection.CollOps[A, CC[_], C]
which returns a C
or a CC[_]
is going to have this problem.
Thanks for the good examples!
Do we all agree that we don't want to add ()
to tail
/flatten
/empty
etc?
Can we just say "add ()
if the method can perform side effects"? That's what @Ichoran seems to suggest (PR description, last paragraph).
Iterator.next()
, println()
, but .iterator
, .tail
, .empty
, .toArray
, .toList
. I personally like it to have ()
as a sign that a side effect is happening. Multi-threading would not matter, no ()
for methods that only observe (but don't perform) effects. So in principle also .currentTimeMillis
(well that's defined in Java anyway - but do we really need an exception for "exception of System interrupts" as Martin mentions above?)
It leads to the question what is a side effect. Referential transparency is not a useful answer here, as we're working with mutable state. Effects are IO, observable state changs (changes to local / newly allocated state is not an effect, niether is updating a cache). Exceptions are excluded (.head
).
Can we just say "add () if the method can perform side effects"?
Yes, please. This how I've understood and taught ()
for more than a decade now, and it's how I see others understanding and teaching it, too.
I dislike iterator()
and toArray()
— both direct dislike, and dislike of the migration annoyance we will cause people if we do this.
I noticed last night that Factory[Coll]#newBuilder[A]
got ()
as well in the collections redesign as well - is that something that should be reverted as well?
Searching a bit in the library, there's also
Iterable.newSpecificBuilder()
Builder.result()
- this should stay because the builder should not be used afterSetOps.subsets()
- this needs to stay because it's overloaded, https://github.com/scala/bug/issues/9116And outside collections
Random.nextInt()
and friends - this should stay, as getting a number modifies the state of the random instanceRichInt.longValue()
and friends -- they are defined with ()
in the base class ScalaNumericAnyConversions
Numeric.Ops.toInt()
and friendsMethods on Random
change the instance of the class, so they should have ()
under any scheme we've discussed! Likewise with Builder.result()
--the behavior after calling it is undefined (or defined if it's a ReusableBuilder
, but you still shouldn't call it again).
There was a discussion on https://gitter.im/scala/collection-strawman about whether or not
IterableOnce.iterator()
/Iterable.iterator()
should have an empty parameter list, or none at all. I will do my best to record the main points of the discussion below.(mostly quotes from Gitter)
I ask "Why does
IterableOnce.iterator()
have an empty parameter list, rather than no parameter list?"@szeiger and @julienrf suggest it is because
iterator()
is a side-effecting method, in that two returnedIterator
s have distinct identities, so the method is not referentially transparent.@martijnhoekstra notes that "If everything that's not referentially transparent is a side effect, obtaining an iterator is a side-effect".
I ask "What about
toArray
? The Arrays are mutable, so while initially they would be equivalent, you could modify them so they are not".@szeiger: "Indeed, if you go by referential transparency a lot more methods would need an empty argument list. What about
size
? When used on a mutable collection it's not referentially transparent. Of course, nothing is. The distinction doesn't make any sense there. So what aboutList.size
? It is referentially transparent but that's not good enough because substituting such calls has huge performance implications. In a pure functional language the compiler can help you with that but Scala can't."@SethTisue: "My intuition likes the 'initially they would be equivalent' side of your way of phrasing it, @NthPortal. But I'm having trouble backing up my intuition with a convincing argument [...] I think my intuition is something like, require the
()
if it introduces mutation or side effects into a program that wouldn't otherwise have it. AndtoArray
doesn't do that, because you are free to write code that usesArray
but never mutates an array".@martijnhoekstra: "Neither does
iterator()
though - provided you never callnext()
[...] There is a lot less you can do with an iterator without modifying it than you can do with anArray
without modifying it though".@SethTisue: "In Scala through 2.12,
.iterator
has no parameter lists and the compiler says 'Iterator[...] does not take parameters' if you write.iterator()
. So somebody made a different decision on this years ago. Just leaving it like it was doesn't feel terrible to me. Even if you buy the argument that it ought to have the()
, I'm not sure it's worth annoying people with the change, especially since (as Stefan has pointed out) the change will become more annoying when future Scala requires the()
".@Ichoran says that "My intuition is that
()
as a way to denote referential transparency is not a good idea anyway. Instead,()
is useful as a marker for things that observably mutate the object on which the method is called. That is, for some methodbar
, the behavior ofx.bar
will be different afterx.foo()
." @SethTisue likes this intuition.