spray / spray-json

A lightweight, clean and simple JSON implementation in Scala
Apache License 2.0
972 stars 190 forks source link

Crash due to Unicode U+FFFD replacement character #175

Open dcheckoway opened 8 years ago

dcheckoway commented 8 years ago

When a JSON string has the U+FFFD replacement character in it, spray-json crashes and can't parse it.

The byte sequence is EF BF BF.

https://en.wikipedia.org/wiki/Specials_%28Unicode_block%29#Replacement_character

Check it out:

scala> import spray.json._
import spray.json._

scala> val bytes = Array(123, 34, 104, 101, 108, 108, 111, 34, 58, 34, 116, 104, 105, 115, 32, 0xEF, 0xBF, 0xBF, 32, 119, 111, 114, 108, 100, 34, 125).map(_.toByte)
bytes: Array[Byte] = Array(123, 34, 104, 101, 108, 108, 111, 34, 58, 34, 116, 104, 105, 115, 32, -17, -65, -65, 32, 119, 111, 114, 108, 100, 34, 125)

scala> val s = new String(bytes, "UTF-8")
s: String = {"hello":"this ￿ world"}

scala> s.parseJson
spray.json.JsonParser$ParsingException: Unexpected end-of-input at input index 15 (line 1, position 16), expected '"':
{"hello":"this
               ^

  at spray.json.JsonParser.fail(JsonParser.scala:213)
  at spray.json.JsonParser.require(JsonParser.scala:196)
  at spray.json.JsonParser.string(JsonParser.scala:144)
  at spray.json.JsonParser.value(JsonParser.scala:63)
  at spray.json.JsonParser.members$1(JsonParser.scala:81)
  at spray.json.JsonParser.object(JsonParser.scala:86)
  at spray.json.JsonParser.value(JsonParser.scala:60)
  at spray.json.JsonParser.parseJsValue(JsonParser.scala:43)
  at spray.json.JsonParser$.apply(JsonParser.scala:28)
  at spray.json.PimpedString.parseJson(package.scala:45)
  ... 43 elided

We're working around this by catching this exception and replacing the replacement character, and then re-parsing. Hopefully you see some humor in replacing the replacement. :-)

I haven't dug into spray-json to see how troublesome it'd be to fix this, but please do. Thanks!

ethanp commented 8 years ago

+1 Thanks for explaining how to work around it.