Closed MasseGuillaume closed 6 years ago
Thanks @MasseGuillaume for the investigation! Could you add an “expected” column so that we know what actions should be performed?
Regarding the examples: We need to check the path taken by converting Some(1)
to Iterable
, it's reasonable to expect a known size.
OTOH, it makes perfect sense that c.Iterator(1)
has knownSize = 1
, this an improvement over the previous state where you would get hasDefiniteSize = false
.
I'm rescheduling this for RC1. Better known sizes are a performance improvement that can still be done after M5.
There's a correctness issue as well--hasDefiniteSize
no longer obeys its old contract of reporting when the collection is unbounded (i.e. potentially infinite). Shall I open another issue for that one?
The old contract also states that Iterator
is to report it is potentially unbounded even when it is taken from a finite collection.
some | -1 | true | Some(1)
Option[A]
only has knownSize
through implicit conversion to Iterable[A]
which in implementation, creates a List[A]
.
How about changing ::
.knownSize to if (tail == Nil) 1 else -1
?
Option[A] only has knownSize through implicit conversion to Iterable[A] which in implementation, creates a List[A].
Then we have lost this when we migrated to scala/scala. The goal was to avoid the creation of an intermediate List
when we are only interested in an Iterator
(which is the case in flatMap
, for instance). We could re-implement it like this, btw:
implicit def optionToIterableOnce[A](maybeA: scala.Option[A]): IterableOnce[A] =
if (maybeA.isEmpty) Iterator.empty else Iterator.single(maybeA.get)
Iterator.single
should override knownSize
, BTW.
I'm gonna open a PR for this, I also noticed there's two implementations of single-element Iterators, which I'll combine.
@Ichoran You're right that hasDefiniteSize
is not identical to knownSize >= 0
. Would it make more sense to add concrete hasDefiniteSize
implementations in 2.13 for compatibility instead of using the simple but not quite correct version we have now?
I don't see a reason for keeping the old semantics around In the long run. Quoting the 2.12 scaladocs:
Note: many collection methods will not work on collections of infinite sizes. The typical failure mode is an infinite loop. These methods always attempt a traversal without checking first that hasDefiniteSize returns true. However, checking hasDefiniteSize can provide an assurance that size is well-defined and non-termination is not a concern.
In other words: Even though the standard library provides this method it does not actually make any use of it. How useful is this method for users if it's not even useful for other collection methods?
The old contract also states that Iterator is to report it is potentially unbounded even when it is taken from a finite collection.
From the 2.12 scaladocs again:
Non-empty Iterators usually return false even if they were created from a collection with a known finite size.
Note the "usually", there is no guarantee. My guess is that this was the simplest way to implement it and nobody wanted to put in the extra effort because they knew it wouldn't be useful anyway.
I think we should add concrete implementations of hasDefiniteSize
for compatibility. I am agnostic about whether we should retain the method in the long run, or simply deprecate it. On the one hand, there isn't any other way to check for possibly unbounded collections (knownSize < 0
doesn't mean it's possible); on the other, unbounded collections are not very common, and extending the collections isn't very common.
I think I'm in favour of deprecating hasDefiniteSize
and removing it in the long run, as it has poor semantics (both in terms of what is documented and what is possible), and is also not very actionable.
Poor semantics:
The old contract also states that Iterator is to report it is potentially unbounded even when it is taken from a finite collection.
This is not really a desirable property, as it means that the method is unnecessarily conservative, and may prevent the use of collections which actually have a definite size.
My guess is that this was the simplest way to implement it and nobody wanted to put in the extra effort because they knew it wouldn't be useful anyway.
I understand why it's significantly simpler to implement, and don't think it should be done otherwise; however, it still undermines the value of the method.
Some collections (e.g. LazyList
) are unable to report that they have a definite size, even when it is known to the programmer. Creating a LazyList
from the lines of a file will always yield one of finite size (unless you manage to create an infinitely large file?), but there's no way for the LazyList
instance to know or relay this information. Many of the generators for LazyList
(e.g. fill
, tabulate
) create fixed-size LazyList
s, but they too cannot relay this information.
Stream
in 2.12 scans through itself as long as its tail
s are defined (while checking for cycles), and returns true
if all elements are evaluated. This strategy would work for LazyList
as well, but unfortunately, it runs in O(n) time in the case where it returns true
(for both Stream
and LazyList
), which seems undesirable.
Not actionable:
What are you supposed to do if you call hasDefiniteSize
and it returns false
? Throw an exception? java.lang.IllegalArgumentException: the collection passed in was not definitely finite in size
oh great thanks
I was working on scala-collection-compat to add an extension method for
knownSize
and I notice a couple of inconsistencies.This tables summarize a bunch of collection containers, their
knownSize
in 2.13.X, theirhasDefiniteSize
in 2.12.X if available (? otherwise).For example:
Some(1)
hasknownSize = -1
, I expectknownSize = 1
. Ibid for cBitSet, etcc.Iterator(1)
hasknownSize = 1
, I expectknownSize = -1
, this would be consistent withhasDefiniteSize = false