spray / spray-json

A lightweight, clean and simple JSON implementation in Scala
Apache License 2.0
974 stars 190 forks source link

performance bottleneck in ProductFormats.fromField #132

Closed fommil closed 9 years ago

fommil commented 9 years ago

I was profiling the parsing of 60GB of data and noticed that ProductFormats.fromField was a bottleneck, because if the data is a None then we have to go through creation of an exception to get there, and that is expensive.

I rewrote the function to aggressively look for the None case, which will incur a classtype check (very fast) for all calls and a Map.contains call for all Option formats (including the Some case), which I feel is a good trade-off.

There is probably a sweet spot of ratio between Somes/Nones before the performance flips again, but it would have to be something like 1,000,000 Some to every None, as the current code is very ineffecient for Nones.

BTW, I backported the master JsonParser to scala 2.9 and seen ridiculous performance improvements... well done! It is a bit of a shame for parboiled, though :-(

Rewrite (I put this in a trait that inherits from DefaultJsonProtocol):

  /** ProductFormats.fromField was shown to be a CPU bottleneck for None entries */
  override protected def fromField[T](value: JsValue, fieldName: String)
                                     (implicit reader: JsonReader[T]) = value match {
    case x: JsObject if
      (reader.isInstanceOf[OptionFormat[_]] &
        !x.fields.contains(fieldName)) =>
      None.asInstanceOf[T]
    case x: JsObject =>
      try reader.read(x.fields(fieldName))
      catch {
        case e: NoSuchElementException =>
          deserializationError("Object is missing required member '" + fieldName + "'", e)
      }
    case _ => deserializationError("Object expected in field '" + fieldName + "'")
  }
analytically commented 9 years ago

+1

fommil commented 9 years ago

the try/catch could perhaps be more cleanly implemented as a get with a match on Some/None. But I guess this way potentially allows Map implementations to avoid creating a needless Some (note that scala stdlib creates one anyway).

sirthias commented 9 years ago

Interesting. Thanks for digging into this. We should get this fixed indeed.

It is a bit of a shame for parboiled, though :-(

What do you mean here?

fommil commented 9 years ago

Re: parboiled you rewrote the parser to be dependency free and now it's a lot faster with much lower memory churn. It's a shame that parboiled can't achieve the same performance levels. In our ENSIME S-Express parser, performance isn't as critical as a general REST server (we have one call at a time by definition of user interaction) so parboiled works well for us, but if we ever needed to handle 1000+ requests/second in a low latency server, things would be different...

fommil commented 9 years ago

I'll package this as a PR now that github makes it super easy to do so :smile:

sirthias commented 9 years ago

Yes, spray-json was depending on parboiled 1.x which interprets the parser definition across an input and as such can be a bit slow. If you need higher performance you can use parboiled2 which gives you a true parser generator (no interpretation) and is about 5-10 times as fast as pb1.

fommil commented 9 years ago

awesome, I didn't know the performance gain was so great! when we release ENSIME 1.0, we're going to drop scala 2.9 support and that may be an option. I also hope that you will take a look at our upcoming use of Shapeless' TypeClass as it really does work now for sealed traits/case class families: https://github.com/milessabin/shapeless/blob/master/examples/src/main/scala/shapeless/examples/sexp.scala

sirthias commented 9 years ago

Nice! I'll look into it.

fommil commented 9 years ago

btw, if you want to see how to use shapeless just for products, see this code (which is deprecated in the light of TypeClass) https://github.com/ensime/ensime-server/blob/master/src/main/scala/org/ensime/sexp/formats/ProductFormats.scala