scala / bug

Scala 2 bug reports only. Please, no questions — proper bug reports only.
https://scala-lang.org
232 stars 21 forks source link

knownSize vs hasDefiniteSize #11065

Closed MasseGuillaume closed 6 years ago

MasseGuillaume commented 6 years ago

I was working on scala-collection-compat to add an extension method for knownSize and I notice a couple of inconsistencies.

This tables summarize a bunch of collection containers, their knownSize in 2.13.X, their hasDefiniteSize in 2.12.X if available (? otherwise).

For example:

Short name knownSize hasDefiniteSize Source
some -1 true Some(1)
none 0 true None
option -1 true Option(1)
array 1 true Array(1)
string 1 true "a"
cBitSet -1 true c.BitSet(1)
cBufferedIterator -1 false scala.io.Source.fromString("a").buffered
cIndexedSeq 1 true c.IndexedSeq(1)
cIndexedSeqVie 1 true c.IndexedSeq(1).view
cIterable -1 true c.Iterable(1)
cIterableOnce -1 ? (c.Iterable(1): c.IterableOnce[Int])
cIterator 1 false c.Iterator(1)
lazyZip2 -1 ? (List(1) lazyZip List(2))
lazyZip3 -1 ? (List(1) lazyZip List(2) lazyZip List(3))
lazyZip4 -1 ? (List(1) lazyZip List(2) lazyZip List(3) lazyZip List(4))
cLinearSeq -1 true c.LinearSeq(1)
cMap 1 true c.Map(1 -> 1)
cMapView 1 true c.Map(1 -> 1).view
cSeq -1 true c.Seq(1)
cSeqView -1 true c.Seq(1).view
cSet 1 true c.Set(1)
cSortedMap -1 true c.SortedMap(1 -> 1)
cSortedSet -1 true c.SortedSet(1)
cView -1 ? (c.Seq(1).view: c.View[Int])
iArraySeq 1 ? i.ArraySeq(1)
iBitSet -1 true i.BitSet(1)
iChampHashMap 1 ? i.ChampHashMap(1 -> 1)
iChampHashSet 1 ? i.ChampHashSet(1)
iHashMap -1 true i.HashMap(1 -> 1)
iHashSet -1 true i.HashSet(1)
iIndexedSeq 1 true i.IndexedSeq(1)
iIntMap -1 true i.IntMap(1 -> 1)
iIterable -1 true i.Iterable(1)
iLazyList -1 ? i.LazyList(1)
iLinearSeq -1 true i.LinearSeq(1)
iList -1 true i.List(1)
iListMap -1 true i.ListMap(1 -> 1)
iListSet -1 true i.ListSet(1)
iLongMap -1 true i.LongMap(1L -> 1)
iMap 1 true i.Map(1 -> 1)
iNil 0 true i.Nil
iNumericRange 1 true 1L to 1L
iQueue -1 true i.Queue(1)
iRange 1 true 1 to 1
iSeq -1 true i.Seq(1)
iSet 1 true i.Set(1)
iSortedMap -1 true i.SortedMap(1 -> 1)
iSortedSet -1 true i.SortedSet(1)
iStream -1 false i.Stream(1)
iTreeMap -1 true i.TreeMap(1 -> 1)
iTreeSet -1 true i.TreeSet(1)
iVector 1 true i.Vector(1)
iVectorIterator 1 false i.Vector(1).iterator
iWrappedString 1 ? i.WrappedString('a')
mAnyRefMap -1 true m.AnyRefMap(Nil -> 1)
mArrayBuffer 1 true m.ArrayBuffer(1)
mArrayBufferView 1 true m.ArrayBuffer(1).view
mArrayDeque 1 ? m.ArrayDeque(1)
mArraySeq 1 ? m.ArraySeq(1)
mBitSet -1 true m.BitSet(1)
mBuffer 1 true m.Buffer(1)
mHashMap -1 true m.HashMap(1 -> 1)
mHashSet -1 true m.HashSet(1)
mIndexedSeq 1 true m.IndexedSeq(1)
mIterable 1 true m.Iterable(1)
mLinkedHashMap -1 true m.LinkedHashMap(1 -> 1)
mLinkedHashSet -1 true m.LinkedHashSet(1)
mListBuffer 1 true m.ListBuffer(1)
mListMap -1 true m.ListMap(1 -> 1)
mLongMap -1 true m.LongMap(1L -> 1)
mMap -1 true m.Map(1 -> 1)
mMultiMap -1 true new m.HashMap[Int, m.Set[Int]] with m.MultiMap[Int, Int]
mOpenHashMap -1 true m.OpenHashMap(1 -> 1)
mPriorityQueue -1 true m.PriorityQueue(1)
mQueue 1 true m.Queue(1)
mSeq 1 true m.Seq(1)
mSet -1 true m.Set(1)
mSortedMap -1 true m.SortedMap(1 -> 1)
mSortedSet -1 true m.SortedSet(1)
mStack 1 true m.Stack(1)
mStringBuilder 0 true new m.StringBuilder()
mTreeMap -1 true m.TreeMap(1 -> 1)
mTreeSet -1 true m.TreeSet(1)
mUnrolledBuffer -1 true m.UnrolledBuffer(1)
mWeakHashMap -1 true m.WeakHashMap(1 -> 1)
bufferedSource -1 false scala.io.Source.fromFile(new java.io.File("../../../build.sbt"))
source -1 false scala.io.Source.fromString("hello")
tuple2Zipped -1 true (List(1), List(2)).zipped
tuple3Zipped -1 true (List(1), List(2), List(3)).zipped
zippedIterable2 -1 ? (tuple2Zipped: runtime.ZippedIterable2[Int, Int])
zippedIterable3 -1 ? (tuple3Zipped: runtime.ZippedIterable3[Int, Int, Int])
systemProperties -1 ? sys.props
julienrf commented 6 years ago

Thanks @MasseGuillaume for the investigation! Could you add an “expected” column so that we know what actions should be performed?

szeiger commented 6 years ago

Regarding the examples: We need to check the path taken by converting Some(1) to Iterable, it's reasonable to expect a known size.

OTOH, it makes perfect sense that c.Iterator(1) has knownSize = 1, this an improvement over the previous state where you would get hasDefiniteSize = false.

I'm rescheduling this for RC1. Better known sizes are a performance improvement that can still be done after M5.

Ichoran commented 6 years ago

There's a correctness issue as well--hasDefiniteSize no longer obeys its old contract of reporting when the collection is unbounded (i.e. potentially infinite). Shall I open another issue for that one?

The old contract also states that Iterator is to report it is potentially unbounded even when it is taken from a finite collection.

joshlemer commented 6 years ago

some | -1 | true | Some(1)

Option[A] only has knownSize through implicit conversion to Iterable[A] which in implementation, creates a List[A].

How about changing ::.knownSize to if (tail == Nil) 1 else -1?

julienrf commented 6 years ago

Option[A] only has knownSize through implicit conversion to Iterable[A] which in implementation, creates a List[A].

Then we have lost this when we migrated to scala/scala. The goal was to avoid the creation of an intermediate List when we are only interested in an Iterator (which is the case in flatMap, for instance). We could re-implement it like this, btw:

implicit def optionToIterableOnce[A](maybeA: scala.Option[A]): IterableOnce[A] =
  if (maybeA.isEmpty) Iterator.empty else Iterator.single(maybeA.get)

Iterator.single should override knownSize, BTW.

joshlemer commented 6 years ago

I'm gonna open a PR for this, I also noticed there's two implementations of single-element Iterators, which I'll combine.

szeiger commented 6 years ago

@Ichoran You're right that hasDefiniteSize is not identical to knownSize >= 0. Would it make more sense to add concrete hasDefiniteSize implementations in 2.13 for compatibility instead of using the simple but not quite correct version we have now?

I don't see a reason for keeping the old semantics around In the long run. Quoting the 2.12 scaladocs:

Note: many collection methods will not work on collections of infinite sizes. The typical failure mode is an infinite loop. These methods always attempt a traversal without checking first that hasDefiniteSize returns true. However, checking hasDefiniteSize can provide an assurance that size is well-defined and non-termination is not a concern.

In other words: Even though the standard library provides this method it does not actually make any use of it. How useful is this method for users if it's not even useful for other collection methods?

The old contract also states that Iterator is to report it is potentially unbounded even when it is taken from a finite collection.

From the 2.12 scaladocs again:

Non-empty Iterators usually return false even if they were created from a collection with a known finite size.

Note the "usually", there is no guarantee. My guess is that this was the simplest way to implement it and nobody wanted to put in the extra effort because they knew it wouldn't be useful anyway.

Ichoran commented 6 years ago

I think we should add concrete implementations of hasDefiniteSize for compatibility. I am agnostic about whether we should retain the method in the long run, or simply deprecate it. On the one hand, there isn't any other way to check for possibly unbounded collections (knownSize < 0 doesn't mean it's possible); on the other, unbounded collections are not very common, and extending the collections isn't very common.

NthPortal commented 6 years ago

I think I'm in favour of deprecating hasDefiniteSize and removing it in the long run, as it has poor semantics (both in terms of what is documented and what is possible), and is also not very actionable.

Poor semantics:

Not actionable:

What are you supposed to do if you call hasDefiniteSize and it returns false? Throw an exception? java.lang.IllegalArgumentException: the collection passed in was not definitely finite in size oh great thanks