scala / bug

Scala 2 bug reports only. Please, no questions — proper bug reports only.
https://scala-lang.org
230 stars 21 forks source link

Arithmetic overflow possible in IterableOnceOps.copyToArray #12795

Open noresttherein opened 1 year ago

noresttherein commented 1 year ago

Reproduction steps

Scala version: 2.13.10

In IterableOnceOps:

  def copyToArray[B >: A](xs: Array[B], start: Int, len: Int): Int = {
    val it = iterator
    var i = start
    val end = start + math.min(len, xs.length - start)
    while (i < end && it.hasNext) {
      xs(i) = it.next()
      i += 1
    }
    i - start
  }

Problem

There are actually two overflows here, but xs.length - start is arguably not an issue, as a negative start causes a negative end and nothing will be copied (instead of throwing an exception if an overflow had not happened). There is a second one in start + math.min(len, ...), however, and on a valid path:

copyToArray(new Array[Hamster](Int.MaxValue - 100), Int.MaxValue - 50, Int.MaxValue

The 'new' collection framework is much better in terms of avoiding overflows than the old one, where almost everything was overflowing, but if there was some competition I'm pretty sure I could find a good dozen more. Iterator.drop in complex iterator implementations in particular is very vulnerable, and that's a bit of a bummer, because iterators actually can be easily used for collections with more than Int.MaxValue elements.

I'd like to use this opportunity to climb my soap box and say again I don't like the exact semantics of indices out of range in copyToArray. Essentially, they are closely tied to this particular implementation, and troublesome to reproduce to a tee for all possible data sets. A negative start is allowed only if the total number of elements which would be copied, as influenced by len, this.size, and xs.length, would be zero or less. I think it would be cleaner if it was either always rejected with an exception, or always accepted.

While I can see the value of permissive indices in slicing operations, where we are dealing with bounds on collection sizes, using them in a context of specific indices in a sequence (like indexOfSlice and some other issues I reported),

  1. doesn't seem to me to follow the policy of least surprise,
  2. is inconsistent with mutable collections, in which all mutating operations (as far as I can tell), reject all cases of an index out of range (see remove, insertAll, etc.),
  3. is not particularly useful. For example, if copyToArray always accepted negative start and simply ignored -start first elements in the collection, it would be actually quite useful.

The worst offender, by far, though, is, in my eyes, patch. The policy of clipping the index of the start of the patch, can easily lead to quiet bugs, without any beneficial use cases I can see. If, as I suggested above, instead the indices were not modified, but the part of the elems (patch) collection falling outside of the patched (this) collection was ignored, it would be both more logical and useful. Ehm, sorry.

lrytz commented 1 year ago

Would it be worth building an inventory of where things are in terms of out-of-bounds handling (relevant operations, what is accepted, which exceptions for rejections)? And agree on a doc / spec that we can follow? @julienrf do you know if there is anything existing?

climb my soap box

nice, til 🖼️

julienrf commented 1 year ago

I don’t think we have such a spec, except the scala-collection-laws maybe…

som-snytt commented 1 year ago

climb my soap box

not to be confused with soap opera.

I don't think a bug in arithmetic warrants an existential crisis in the API. It's perfectly natural to minimize bounds checking and only fail "on demand".

Permissive inputs are well-established for xs.drop(100) (as a paradigm).

I'm more sympathetic to the argument about patch, because it seems to me that it should be "trivial" to calculate "what will be replaced." slice doesn't work because patch takes a count, not an end index.

scala> xs
val res17: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

scala> val p = List(42, 27)
val p: List[Int] = List(42, 27)

scala> xs.patch(-11, p, 5)
val res18: List[Int] = List(42, 27, 6, 7, 8, 9, 10)

scala> xs.slice(-11, -6)
val res19: List[Int] = List()

scala> xs.slice(-11, 1000).take(5)
val res20: List[Int] = List(1, 2, 3, 4, 5)

Then the definition of patch is really res20 (n elements of the maximal slice at i) and not res19.

The case is stronger for patchInPlace, which ought to have similar behavior (except in-place). If buf.update throws, then buf.patchInPlace should throw for the same bad index.

It's mutating operations that throw instead of silently truncating, such as buf.remove(1000).

Perhaps I'm wrong that patch and patchInPlace ought to have the same policy on the from parameter, just because the method names are similar. Or perhaps I'm wrong that patchInPlace show throw like update, if "the elements to replace" is defined in terms of slice.

If patch means "patch slice", then the hypothetical patchRange(from, to) is the form that would throw (as well as patchRangeInPlace).

It would be nice if these behaviors were summarized in the collections overview, of course.

SethTisue commented 10 months ago

@scala/collections

Ichoran commented 10 months ago

I think Julien is correct that in general we don't have specifications for what happens when various indices are out of bounds. Treating everything as undefined behavior is unwise, because people will test for the behavior, then assume that the experimentally-determined behavior is the intended behavior, and then be surprised when some other collection of the same type behaves differently.

I tried to put some of what I thought were sensible invariants into collections-laws. I don't think I was comprehensive, and anyway, that just helps people working on the standard library avoid unexpected behavior; it still doesn't help users understand what to expect.

The most important principle is, I think, to make all the collections have the same behavior, with an occasional exception if absolutely critical for array performance. I don't think copyToArray is one of these exceptions.

Logically, I think copyToArray(xs, start, len) should be identical to

val it = iterator
var l = len
var s = start
while (l > 0 && it.hasNext) {
  xs(s) = it.next()
  s += 1
  l -= 1
}
len - l

That is, I agree that the asymmetry between start and end when you're specifying the max length is a bad thing. However, the docs say that the copy ends when the array does, and it's behavior that people may count on, and the return Int helps you figure out whether you didn't transfer the data you thought you would.

So I'd rewrite it like this:

  def copyToArray[B >: A](xs: Array[B], start: Int, len: Int): Int = 
    if (len <= 0) 0
    else {
      var i = start
      val it = iterator
      val end = if (xs.length - len <= start) xs.length else start + len
      while (i < end && it.hasNext) {
        xs(i) = it.next()
        i += 1
      }
      i - start
    }

This works because xs.length and len are both positive, so the subtraction will be in bounds; if you add len to both sides, the inequality is xs.length <= start + len, and then the if-statement is just a min.

If start is negative, it doesn't matter to the logic; it will work correctly and index into a negative value and throw an exception, as it does now.

As a bonus, in the case where len is zero or negative, and getting the iterator is not a no-op, the code is a touch faster.

(Note: code untested.)

Ichoran commented 10 months ago

Specifically regarding patch, I think the behavior was chosen to be identical to, but more efficient than, that which you would get from the most obvious collections operations.

In particular, xs.patch(n, ys, m) is equivalent to xs.take(n) ++ ys ++ xs.drop(n).drop(m). The double-drop is weird until you notice that the natural way to split xs is the splitAt method: val (xsL, xsR) = xs.splitAt(n); xsL ++ ys ++ xsR.drop(m).

So in terms of coherence with the "natural" way to do things with higher-order methods, I think patch does the right thing. patchInPlace should match its behavior. If it doesn't, I would consider that a bug.

noresttherein commented 10 months ago

I don't want to sound like a bad record, but to confound the matter further, we have takeInPlace, dropInPlace which are, like their immutable namesakes, permissive. And updated, just like update is not. I think this is the exact occasion on which Poles use the phrase English-speakers allegedly find so funny: 'not my circus, not my monkeys'. Ultimately, all collection methods can be implemented by a call to the equivalaent method on an iterator, followed by a conversion, and maybe it would be a good frame of reference. I am not terribly convinced about the 'existing index distinction', but I agree that a simple arithmetic overflow issue is certainly not a reason to start a discussion about a huge scala.collection revamp. Actually, I think I have the biggest issue with Buffer.remove, as the stand-out. But I also have a set of extension methods for collections, for example update/updated which take a collection of elements, rather than a single one, and I had real troubles with deciding what would be the most consistent behaviour with the existing methods. On one hand, it's a job that patch can easily do - but patch requires you to know the size of the argument collection if you want to exactly splice the recipient of the call, which we won't in general know for a IterableOnce, so I found a use for it beside patch. Now, I am easily irked by such small inconsistencies and perfectly aware that I am not normal at that, but this talk is for me just a tip of an iceberg in how there is no rhyme or reason to how parameters are named to equivalent methods - which would not be quite as important if Scala didn't allow to specify ''any'' parameter by name, or that there is no single convention to how equivalent mutable/immutable methods are named.

BTW., another soapbox, maybe it will sow a seed of doubt in someone of influence on how Scala evolves: I think it would be much more prudent if only method parameters annotated with @.***, or some such, where legal candidates for named arguments, rather than positional ones. People don't put a huge lot of thought most of the time into how parameters are named, and it is unfortunate to be bound by early choices for compatibility. I don't think we often use it for arbitrary calls, it seems like their place is withcopy`-like methods, with many parameters, with signatures defined specifically with named arguments in mind, so I wouldn't find it limiting myself. Returning to an API with a fresh eye, or with feedback on how something may be ambiguous, limits such an easy improvement. Then, there is the case that in an overriding method parameters can be renamed freely, and while I'd consider it bad practice, it can easily happen if you write the overridden method manually, rather than asking an IDE to copy exactly the signature for you. Together it seems like a bad combination. c):-i Marcin Mościcki

    It is, as far as he knows, the only way of coming downstairs,
    but sometimes he feels that there really is another way,
    if only he could stop bumping for a moment and think of it.
    And then he feels that perhaps there isn't.

On Wed, Aug 9, 2023 at 1:54 AM Ichoran @.***> wrote:

Specifically regarding patch, I think the behavior was chosen to be identical to, but more efficient than, that which you would get from the most obvious collections operations.

In particular, xs.patch(n, ys, m) is equivalent to xs.take(n) ++ ys ++ xs.drop(n).drop(m). The double-drop is weird until you notice that the natural way to split xs is the splitAt method: val (xsL, xsR) = xs.splitAt(n); xsL ++ ys ++ xsR.drop(m).

So in terms of coherence with the "natural" way to do things with higher-order methods, I think patch does the right thing. patchInPlace should match its behavior. If it doesn't, I would consider that a bug.

— Reply to this email directly, view it on GitHub https://github.com/scala/bug/issues/12795#issuecomment-1670460330, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDKYWS4KFRE6EY7KSKHCNDXULGTZANCNFSM6AAAAAAYTE5B2E . You are receiving this because you authored the thread.Message ID: @.***>

som-snytt commented 10 months ago

Parameters have been renamed and annotated with deprecatedName.

The most important principle is, I think, to make all the collections have the same behavior

If I write x(i) = 42, I would expect similar semantics whether x is a mutable collection or an array.

I have the biggest issue with Buffer.remove

At the moment, after a couple of months, I don't have a sense of what is the priority issue, in general and in your view.

Recently, I was looking at "value discard" warnings and noticed that remove(i) returns a value but remove(i, 1) does not return a slice. (It's not an exact replacement for reasons of efficiency. There's extra work to check the range.) So I agree that "uniformity of expectation" could be improved.

On parameters, index is sometimes idx. But especially Shrinkable.subtractOne(elem) becomes Buffer.subtractOne(x). I wonder if anyone has written the Scalafix rule to enforce parameter names in overrides.