Open noresttherein opened 1 year ago
Would it be worth building an inventory of where things are in terms of out-of-bounds handling (relevant operations, what is accepted, which exceptions for rejections)? And agree on a doc / spec that we can follow? @julienrf do you know if there is anything existing?
climb my soap box
nice, til 🖼️
I don’t think we have such a spec, except the scala-collection-laws
maybe…
climb my soap box
not to be confused with soap opera.
I don't think a bug in arithmetic warrants an existential crisis in the API. It's perfectly natural to minimize bounds checking and only fail "on demand".
Permissive inputs are well-established for xs.drop(100)
(as a paradigm).
I'm more sympathetic to the argument about patch
, because it seems to me that it should be "trivial" to calculate "what will be replaced." slice
doesn't work because patch
takes a count, not an end
index.
scala> xs
val res17: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
scala> val p = List(42, 27)
val p: List[Int] = List(42, 27)
scala> xs.patch(-11, p, 5)
val res18: List[Int] = List(42, 27, 6, 7, 8, 9, 10)
scala> xs.slice(-11, -6)
val res19: List[Int] = List()
scala> xs.slice(-11, 1000).take(5)
val res20: List[Int] = List(1, 2, 3, 4, 5)
Then the definition of patch
is really res20 (n elements of the maximal slice at i) and not res19.
The case is stronger for patchInPlace
, which ought to have similar behavior (except in-place). If buf.update
throws, then buf.patchInPlace
should throw for the same bad index.
It's mutating operations that throw instead of silently truncating, such as buf.remove(1000)
.
Perhaps I'm wrong that patch
and patchInPlace
ought to have the same policy on the from
parameter, just because the method names are similar. Or perhaps I'm wrong that patchInPlace
show throw like update
, if "the elements to replace" is defined in terms of slice
.
If patch
means "patch slice", then the hypothetical patchRange(from, to)
is the form that would throw (as well as patchRangeInPlace
).
It would be nice if these behaviors were summarized in the collections overview, of course.
@scala/collections
I think Julien is correct that in general we don't have specifications for what happens when various indices are out of bounds. Treating everything as undefined behavior is unwise, because people will test for the behavior, then assume that the experimentally-determined behavior is the intended behavior, and then be surprised when some other collection of the same type behaves differently.
I tried to put some of what I thought were sensible invariants into collections-laws. I don't think I was comprehensive, and anyway, that just helps people working on the standard library avoid unexpected behavior; it still doesn't help users understand what to expect.
The most important principle is, I think, to make all the collections have the same behavior, with an occasional exception if absolutely critical for array performance. I don't think copyToArray
is one of these exceptions.
Logically, I think copyToArray(xs, start, len)
should be identical to
val it = iterator
var l = len
var s = start
while (l > 0 && it.hasNext) {
xs(s) = it.next()
s += 1
l -= 1
}
len - l
That is, I agree that the asymmetry between start and end when you're specifying the max length is a bad thing. However, the docs say that the copy ends when the array does, and it's behavior that people may count on, and the return Int
helps you figure out whether you didn't transfer the data you thought you would.
So I'd rewrite it like this:
def copyToArray[B >: A](xs: Array[B], start: Int, len: Int): Int =
if (len <= 0) 0
else {
var i = start
val it = iterator
val end = if (xs.length - len <= start) xs.length else start + len
while (i < end && it.hasNext) {
xs(i) = it.next()
i += 1
}
i - start
}
This works because xs.length
and len
are both positive, so the subtraction will be in bounds; if you add len
to both sides, the inequality is xs.length <= start + len
, and then the if-statement is just a min.
If start
is negative, it doesn't matter to the logic; it will work correctly and index into a negative value and throw an exception, as it does now.
As a bonus, in the case where len is zero or negative, and getting the iterator is not a no-op, the code is a touch faster.
(Note: code untested.)
Specifically regarding patch
, I think the behavior was chosen to be identical to, but more efficient than, that which you would get from the most obvious collections operations.
In particular, xs.patch(n, ys, m)
is equivalent to xs.take(n) ++ ys ++ xs.drop(n).drop(m)
. The double-drop is weird until you notice that the natural way to split xs
is the splitAt
method: val (xsL, xsR) = xs.splitAt(n); xsL ++ ys ++ xsR.drop(m)
.
So in terms of coherence with the "natural" way to do things with higher-order methods, I think patch
does the right thing. patchInPlace
should match its behavior. If it doesn't, I would consider that a bug.
I don't want to sound like a bad record, but to confound the matter
further, we have takeInPlace
, dropInPlace
which are, like their
immutable namesakes, permissive. And updated
, just like update
is not.
I think this is the exact occasion on which Poles use the phrase
English-speakers allegedly find so funny: 'not my circus, not my
monkeys'. Ultimately, all collection methods can be implemented by a call
to the equivalaent method on an iterator, followed by a conversion, and
maybe it would be a good frame of reference. I am not terribly convinced
about the 'existing index distinction', but I agree that a simple
arithmetic overflow issue is certainly not a reason to start a discussion
about a huge scala.collection
revamp. Actually, I think I have the
biggest issue with Buffer.remove
, as the stand-out. But I also have a set
of extension methods for collections, for example update
/updated
which
take a collection of elements, rather than a single one, and I had real
troubles with deciding what would be the most consistent behaviour with the
existing methods. On one hand, it's a job that patch
can easily do - but
patch
requires you to know the size
of the argument collection if you
want to exactly splice
the recipient of the call, which we won't in
general know for a IterableOnce
, so I found a use for it beside patch
.
Now, I am easily irked by such small inconsistencies and perfectly aware
that I am not normal at that, but this talk is for me just a tip of an
iceberg in how there is no rhyme or reason to how parameters are named to
equivalent methods - which would not be quite as important if Scala didn't
allow to specify ''any'' parameter by name, or that there is no single
convention to how equivalent mutable/immutable methods are named.
, or some such, where legal candidates for named arguments, rather than positional ones. People don't put a huge lot of thought most of the time into how parameters are named, and it is unfortunate to be bound by early choices for compatibility. I don't think we often use it for arbitrary calls, it seems like their place is with
copy`-like methods, with many parameters, with signatures defined
specifically with named arguments in mind, so I wouldn't find it limiting
myself. Returning to an API with a fresh eye, or with feedback on how
something may be ambiguous, limits such an easy improvement. Then, there is
the case that in an overriding method parameters can be renamed freely, and
while I'd consider it bad practice, it can easily happen if you write the
overridden method manually, rather than asking an IDE to copy exactly the
signature for you. Together it seems like a bad combination.
c):-i Marcin Mościcki It is, as far as he knows, the only way of coming downstairs,
but sometimes he feels that there really is another way,
if only he could stop bumping for a moment and think of it.
And then he feels that perhaps there isn't.
On Wed, Aug 9, 2023 at 1:54 AM Ichoran @.***> wrote:
Specifically regarding patch, I think the behavior was chosen to be identical to, but more efficient than, that which you would get from the most obvious collections operations.
In particular, xs.patch(n, ys, m) is equivalent to xs.take(n) ++ ys ++ xs.drop(n).drop(m). The double-drop is weird until you notice that the natural way to split xs is the splitAt method: val (xsL, xsR) = xs.splitAt(n); xsL ++ ys ++ xsR.drop(m).
So in terms of coherence with the "natural" way to do things with higher-order methods, I think patch does the right thing. patchInPlace should match its behavior. If it doesn't, I would consider that a bug.
— Reply to this email directly, view it on GitHub https://github.com/scala/bug/issues/12795#issuecomment-1670460330, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDKYWS4KFRE6EY7KSKHCNDXULGTZANCNFSM6AAAAAAYTE5B2E . You are receiving this because you authored the thread.Message ID: @.***>
Parameters have been renamed and annotated with deprecatedName
.
The most important principle is, I think, to make all the collections have the same behavior
If I write x(i) = 42
, I would expect similar semantics whether x
is a mutable collection or an array.
I have the biggest issue with
Buffer.remove
At the moment, after a couple of months, I don't have a sense of what is the priority issue, in general and in your view.
Recently, I was looking at "value discard" warnings and noticed that remove(i)
returns a value but remove(i, 1)
does not return a slice. (It's not an exact replacement for reasons of efficiency. There's extra work to check the range.) So I agree that "uniformity of expectation" could be improved.
On parameters, index
is sometimes idx
. But especially Shrinkable.subtractOne(elem)
becomes Buffer.subtractOne(x)
. I wonder if anyone has written the Scalafix rule to enforce parameter names in overrides.
Reproduction steps
Scala version: 2.13.10
In
IterableOnceOps
:Problem
There are actually two overflows here, but
xs.length - start
is arguably not an issue, as a negativestart
causes a negativeend
and nothing will be copied (instead of throwing an exception if an overflow had not happened). There is a second one instart + math.min(len, ...)
, however, and on a valid path:The 'new' collection framework is much better in terms of avoiding overflows than the old one, where almost everything was overflowing, but if there was some competition I'm pretty sure I could find a good dozen more.
Iterator.drop
in complex iterator implementations in particular is very vulnerable, and that's a bit of a bummer, because iterators actually can be easily used for collections with more thanInt.MaxValue
elements.I'd like to use this opportunity to climb my soap box and say again I don't like the exact semantics of indices out of range in
copyToArray
. Essentially, they are closely tied to this particular implementation, and troublesome to reproduce to a tee for all possible data sets. A negative start is allowed only if the total number of elements which would be copied, as influenced bylen
,this.size
, andxs.length
, would be zero or less. I think it would be cleaner if it was either always rejected with an exception, or always accepted.While I can see the value of permissive indices in slicing operations, where we are dealing with bounds on collection sizes, using them in a context of specific indices in a sequence (like
indexOfSlice
and some other issues I reported),remove
,insertAll
, etc.),copyToArray
always accepted negativestart
and simply ignored-start
first elements in the collection, it would be actually quite useful.The worst offender, by far, though, is, in my eyes,
patch
. The policy of clipping the index of the start of the patch, can easily lead to quiet bugs, without any beneficial use cases I can see. If, as I suggested above, instead theindices
were not modified, but the part of theelems
(patch) collection falling outside of the patched (this) collection was ignored, it would be both more logical and useful. Ehm, sorry.