scala / bug

Scala 2 bug reports only. Please, no questions — proper bug reports only.
https://scala-lang.org
232 stars 21 forks source link

reverse method on Range of Chars gives unexpected results #12473

Open luigip opened 3 years ago

luigip commented 3 years ago

reproduction steps

using Scala 3,

scala> (1 to 26).reverse
val res0: Range = Range 26 to 1 by -1

scala> ('a' to 'z').reverse
val res1: scala.collection.immutable.NumericRange[Char] = empty NumericRange z to a by ￿

problem

Seems like all other Ranges give you a collection back of the same size when you reverse them, but a Range of Chars returns an empty Range. I understand you can get around this by turning it into a Seq (with memory usage implications) or using reverseIterator, but the above behaviour was surprising and caused a bug in my code.

Similarly it would be nice if you could make a reversed range with ('z' to 'a' by -1) or something to that effect. (Interestingly, this won't compile with a negative number, saying it needs to be a Char, but does with a positive number... :shrug:)

SethTisue commented 3 years ago

this is the Scala 2 bug tracker, but the same behavior does occur in Scala 2

@scala/collections ?

luigip commented 3 years ago

Just to add: toSeq doesn't actually fix it; you have to use toList etc

scala> ('a' to 'z').toSeq.reverse
val res2: scala.collection.immutable.NumericRange[Char] = empty NumericRange z to a by ￿

scala> ('a' to 'z').toList.reverse
val res3: List[Char] = List(z, y, x, w, v, u, t, s, r, q, p, o, n, m, l, k, j, i, h, g, f, e, d, c, b, a)
SethTisue commented 3 years ago

note that 2.12.15 doesn't have this problem

NthPortal commented 3 years ago

I am torn, because on the one hand, Chars are not numbers (I don't care what they're backed by), and on the other hand, it's quite convenient to be able to have a range of them.

anyway, I believe the problem is that Chars are, to the extent that they're numbers at all, unsigned. thus,

scala> ('a' to 'z').step.toInt
val res4: Int = 1

scala> ('a' to 'z').reverse.step.toInt
val res5: Int = 65535

Interestingly, this won't compile with a negative number, saying it needs to be a Char, but does with a positive number... :shrug:

see above about being unsigned

som-snytt commented 3 years ago

@NthPortal char is a numeric value type https://scala-lang.org/files/archive/spec/2.13/12-the-scala-standard-library.html#numeric-value-types

just replying to whether it's a number. Signedness is a nice catch.

NthPortal commented 3 years ago

this is actually a fundamental problem with NumericRange; it doesn't work for unsigned numeric types. one could theoretically write their own u32 type, and it would not be possible to create a reverse range for that type (which is very much numeric) either. I believe it would require a fundamental change such as having a boolean to track if the range is reversed.


we might be able to fix the problem for Char to Char by adding a custom (and private) subtype of NumericRange specifically for Char, and have .reverse do some magic. I don't know if that would be feasible while maintaining bincompat, but my instinct says it is.


char is a numeric value type

Char is numeric in that it represents a UTF-16 code point (which is a number), but numeric operations don't actually make any sense at all for Char. 'a' + 'z' is nonsense. 'a' * 'z' is nonsense. 'a' to 'z' by '<START OF HEADING>' is nonsense, but is what 'a' to 'z' actually means (but you'd never know if I swapped '<START OF HEADING>' for the wrong character). 'a' to 'z' happens to get you the result you want, but adding code points is a nonsensical operation. fundamentally, I think supporting 'a' to 'z' is wrong; it works for that alphabetical range, but anything outside the English alphabet (or even mixing uppercase and lowercase) is meaningless and arbitrary. For example, what does '0' to 'Q' mean?

if you want a range of codepoints, (Int to Int).reverse.view.map(_.toChar) will be clearer, and is also incidentally a workaround to the problem (e.g. ('a'.toInt to 'z'.toInt).reverse.view.map(_.toChar))

NthPortal commented 3 years ago

note that 2.12.15 doesn't have this problem

to clarify, in 2.12 ('a' to 'z').reverse returns a Vector. I've not yet gone digging to find out why.

Edit: in 2.13, NumericRange#reverse returns a NumericRange, so we cannot have it do what 2.12 does by returning a Vector regardless.

NthPortal commented 2 years ago

I kind of want to override .reverse on NumericRange[Char] to throw an exception with the message "Sadness. You have chosen to treat a character as a number, and now look what has happened."

luigip commented 2 years ago

Just from a user's POV, I'm not aware - and I see no suggestion in the Scaladoc - that a Range of Chars being treated as a NumericRange behind the scenes... conceptually I want it to be treated as a hypothetical AlphabeticRange. So perhaps the exception should be a bit more apologetic - "Dang, we redesigned the collections and thought we could treat Chars like any other number, but sadly, it seems we can't. BOOM. Sorry. Please revert to Scala 2.12."

Range, as I see it, would ideally be a type of lightweight collection that will work with any type that offers an implementation - wouldn't it be nice if we could have a Range of Foos that works as you might expect it to, e.g. Foo(10) to Foo(20) by 2, or even Earth to Neptune assuming these are part of an ordered enum of planets. It would have a type parameter as all other collection types do, and the current concept would just apply to the special case of Range[Int]. But then, this is somewhat at odds with how Range is described, as "The Range class represents integer values in range [start;end) with non-zero step value step." - so perhaps we'd need to invent a new class with a different name.

som-snytt commented 2 years ago

Range of Double also did not succeed. I don't think this conceptual issue due to collections, but just about the step.

I agree that range of chars is natural, in the sense that toLower is 'a' + c - 'A'. But as mentioned, code points are ints.

I remember Odersky saying NumericRange is an over-abstraction, but as noted, anything countable could have a range. Maybe Scala 3 enables syntax for GroundControl to MajorTom etc. Someone recently asked somewhere for .. syntax. Oh, here is the enum range: https://github.com/lampepfl/dotty-feature-requests/issues/145

SethTisue commented 2 years ago

(I'm repeating after @som-snytt here, but I think it bears repeating:)

Char being a numeric type (a slightly odd one, but nonetheless solidly a numeric type) is a C and Java tradition carried on in Scala. It is arguably an unfortunate tradition, and it's definitely surprising when first encountered, but for a Range of Chars to be a NumericRange is the product of decades of tradition.

references:

this leads to working-as-specified behaviors such as:

scala> 'a' + 1
val res1: Int = 98

scala> ('a' + 1).toChar
val res2: Char = b

scala> 'a' + 'b'
val res3: Int = 195

scala> 'a' * 2
val res4: Int = 194

scala> 'a' * 'b'
val res5: Int = 9506

and any argument for proposed changes to the behavior Ranges of Charss needs to be considered in this context