scala / bug

Scala 2 bug reports only. Please, no questions — proper bug reports only.
https://scala-lang.org
232 stars 21 forks source link

RichString.reverse fails on combining chars #2565

Closed scabug closed 13 years ago

scabug commented 14 years ago

As seen in [http://msmvps.com/blogs/jon_skeet/archive/2009/11/02/omg-ponies-aka-humanity-epic-fail.aspx OMG Ponies]:

scala> "Les Mise\u0301rables"
res0: java.lang.String = Les Mis�rables

scala> res0.reverse
res1: scala.runtime.RichString = selbaŕesiM seL
scabug commented 14 years ago

Imported From: https://issues.scala-lang.org/browse/SI-2565?orig=1 Reporter: Florian Hars (florian)

scabug commented 14 years ago

@adriaanm said: Works in 2.8:

Welcome to Scala version 2.8.0.r0-b20091109091326 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_15).
Type in expressions to have them evaluated.
Type :help for more information.

scala>  val out = new java.io.PrintStream(System.out, true, "UTF-8")
out: java.io.PrintStream = java.io.PrintStream@b4ef239

scala>  val miserable = "Les Mis?rables"
miserable: java.lang.String = Les Mis?rables

scala>  out.println(miserable.reverse)
selbar�siM seL

(Inspired by http://www.macosxhints.com/article.php?story=20050208053951714)

I opened #2596 to deal with Console.println/the interpreter not printing utf correctly.

scabug commented 14 years ago

@dragos said: I'll leave this closed, but note that the issue is still there. The original ticket used unicode combining chars, an e followed by an accute sign, not an accented e, like in �. Here's the full test with output:

object Test extends Application {
  val out = new java.io.PrintStream(System.out, true, "UTF-8")
  val xs = "Les Mise\u0301rables"
  val ys = "Les Mis�rables"
  out.println(xs)
  out.println(xs.reverse)
  out.println(ys)
  out.println(ys.reverse)
}

giving:

Les Mis�rables
selbaŕesiM seL
Les Mis�rables
selbar�siM seL
scabug commented 14 years ago

@paulp said: The issue as I interpret it is outside plausible library scope in the near term. We don't have a String reversing function, we have a sequence reversing function, and reverse does that nicely. Reversing a sequence usually reverses the String successfully, and where it doesn't we'll need someone to take a serious interest in encoding issues. It's not trivial.

scabug commented 14 years ago

@adriaanm said: Replying to [comment:2 dragos]:

I'll leave this closed, but note that the issue is still there. The original ticket used unicode combining chars, an e followed by an accute sign, not an accented e, like in �. I stand corrected.

scabug commented 14 years ago

Florian Hars (florian) said: Replying to [comment:3 extempore]:

We don't have a String reversing function, we have a sequence reversing function

But then you should rename RichString to RichCharSequence :-).

If a method is called RichString.reverse, the natural thing is to expect it to produce something that is the reverse of the string. You have to either swap combiners with their corresponding base characters before or after the reverse, or normalize the string before reversing (java.text.Normalizer, but see http://diveintomark.org/archives/2004/07/06/nfc).

scala> import java.text.Normalizer._
import java.text.Normalizer._

scala> val s = "souf\ufb02e\u0301s" 
s: java.lang.String = soufflés

scala> s.reverse                    
res6: scala.runtime.RichString = śeflfuos

scala> normalize(s,Form.NFC).reverse
res7: scala.runtime.RichString = s�flfuos

scala> normalize(s,Form.NFKC).reverse
res8: scala.runtime.RichString = s�lffuos

Reopening, this should either be fixed, or resolved as wontfix with a good reason.

scabug commented 14 years ago

@adriaanm said: sorry, we can't fix this because we committed to java.lang.string