TreeSet/TreeMap#range()#clear() shouldn’t clear the whole set/map

astiob commented 1 year ago

Reproduction steps

Scala version: 2.13.11

scala> val set = collection.mutable.SortedSet(1, 2, 3, 4, 5, 6)
val set: scala.collection.mutable.SortedSet[Int] = TreeSet(1, 2, 3, 4, 5, 6)

scala> set.range(3, 5).clear()

scala> set
val res1: scala.collection.mutable.SortedSet[Int] = TreeSet()

scala> val map = collection.mutable.SortedMap(1 -> "one", 2 -> "two", 3 -> "three", 4 -> "four", 5 -> "five")
val map: scala.collection.mutable.SortedMap[Int,String] = TreeMap(1 -> one, 2 -> two, 3 -> three, 4 -> four, 5 -> five)

scala> map.range(2, 4).clear()

scala> map
val res3: scala.collection.mutable.SortedMap[Int,String] = TreeMap()

Problem

range (and all its similarly-named siblings) returns a projection that reflects/contains only a bounded slice of the set/map. It is natural to assume that clearing this projection should clear all elements within these bounds and keep any other elements unaffected.

A trusting programmer could use this (as I did) in an attempt to efficiently delete a whole (sub)range of values, and only testing would show that it doesn’t work.

Looking at the standard library’s source code, it seems that RedBlackTree doesn’t implement a range delete operation at the moment. I hear it is possible to implement range deletion in asymptotically logarithmic time, but I admit I don’t know how complicated the code would be or how big the constant factor would be.

SethTisue commented 1 year ago

@scala/collections

astiob commented 1 year ago

Of particular interest, Java’s collections of the same kind behave the way I expect (and differently from Scala’s):

scala> val set = new java.util.TreeSet(java.util.Arrays.asList(1, 2, 3, 4, 5, 6))
val set: java.util.TreeSet[Int] = [1, 2, 3, 4, 5, 6]

scala> set.subSet(3, 5).clear()

scala> set
val res1: java.util.TreeSet[Int] = [1, 2, 5, 6]

som-snytt commented 1 year ago

This is a good example of why it's too bad that Scaladoc on private elements is silently ignored. The doc on TreeSetProjection says a bit more than rangeImpl.

Showing that mutations are not confined to the range:

scala> val ss = SortedSet("any","zed","maybe")
val ss: scala.collection.mutable.SortedSet[String] = TreeSet(any, maybe, zed)

scala> val r = ss.range("m","p")
val r: scala.collection.mutable.SortedSet[String] = TreeSet(maybe)

scala> r.addOne("beta")
val res24: r.type = TreeSet(maybe)

scala> ss
val res25: scala.collection.mutable.SortedSet[String] = TreeSet(any, beta, maybe, zed)

So the range does not "clip" the operations. On that basis, it's no longer surprising to me that clear clears the underlying collection. But it was not immediately obvious, and the public Scaladoc doesn't help. I don't understand the doc on rangeImpl, but I'm not interested in looking at more source.

som-snytt commented 1 year ago

I see the other difference is that in Java you can't add to the range outside its start/end.

scala> sub.add(2)
java.lang.IllegalArgumentException: key out of range
  at java.base/java.util.TreeMap$NavigableSubMap.put(TreeMap.java:1795)
  at java.base/java.util.TreeSet.add(TreeSet.java:255)
  ... 30 elided

Ichoran commented 1 year ago

Java loves its APIs that are dangerous by design.

Ichoran commented 1 year ago

Where did my long comment go?? The one wherein I outed mapValuesInPlace, filterInPlace and possibly update and remove plus the symbolic equivalents as also problematic?

Ichoran commented 1 year ago

Bother, I guess the long comment vanished into the ether. Briefly:

(1) clear isn't the only problem. Every mutating operation fails to respect boundaries; in some cases this is obviously completely wrong (e.g. clear); in others it's merely weird.

(2) I don't think we should have an API that violates intuitive invariants like "if you put something in a map and there is no error, the map contains it". I also don't think we should be throwing exceptions left and right. So I think in the long run we need to do the same kind of thing we did with old-style Views: promise less, but make it work. For instance, omit any individual-element mutating operations and have the bulk ones only act on the range.

(3) There is no way I can see to fix this without breaking binary compatibility, but since people's code arguably ought to break if they are relying on this to be sane, maybe that's okay.

som-snytt commented 1 year ago

Sorry to hear about the long lost comment. I hope it's not another github trend.

I was using r("beta") = true, for fun, but I see that the syntax suggests a lookup before an assignment, so "update out of range" is possibly too weird, as you say.

The recent tickets lodged by noresttherein have similar complaints about how input indexes are handled, especially in mutable vs immutable API. In the flavor of that thinking, mutation requires "existing" or legal indexes, but immutable clips.

I also noticed that result returns the collection itself. I guess that's true of mutable.Set. But I naively expected r.result to give me a set that is no longer backed by the underlying.

Ichoran commented 1 year ago

In order to be free of entanglements with result (which might just involve returning oneself), one needs a ReusableBuilder. An ordinary Builder can do anything it feels like as long as it obeys the type signature, because like with Iterator, once you call result you're not supposed to touch the original again.

lrytz commented 12 months ago

Range projections seem to be an outlier in the API, or are there other similar operations? filterKeys and mapValues on maps were deprecated for that reason.

The private TreeSetProjection class has some documentation

Mutations are always reflected in the original set

scala> sortedSet.range(2,4).addOne(77)
val res9: scala.collection.mutable.SortedSet[Int] = TreeSet(2, 3)

scala> sortedSet
val res10: scala.collection.mutable.SortedSet[Int] = TreeSet(1, 2, 3, 4, 5, 6, 77)

So forwarding clear seems fine.

But other collections don't implement range as a projection, for example mutable.BitSet returns a new independent collection, no mutations are reflected in the original collection. That inconsistency is more problematic.

noresttherein commented 11 months ago

Wouldn't returning a collection.SortedMap/collection.SortedSet instead of mutable.SortedMap/mutable.SortedSet address the issue? Not binary compatible, and not even source compatible, but that's how I'd imagine it. Currently the results are typed as self type parameter C of SortedMapOps/SortedSetOps, which is problematic in itself, even outside of the mutability issues, because a non strict collection and a strict collection are both conceptually quite different things and imposes unnecessary restriction on implementations. Essentially, public SortedSet/SortedMap subclasses need to be more marker interfaces than implementations, because they might require considerable differences in implementation. Not a huge deal, the only important thing lost is JIT inlining.

som-snytt commented 9 months ago

The "deck of cards" example in JLS uses list.subList(i, j).clear(), which the Javadoc specifies as idiomatic usage.

The definition of structural modification is so suitably vague, that is, usefully:

perturb it in such a fashion that iterations in progress may yield incorrect results

that we ought to have a $perturb macro with this wording. I am greatly perturbed that I'm unable to follow the beginner examples. (The child just began "Intro to Java", as if that weren't already a ruinous outcome.) Anyway, I remembered this ticket to report the perturbation in the force.

scala / bug

TreeSet/TreeMap#range()#clear() shouldn’t clear the whole set/map #12808

Reproduction steps

Problem