twitter / cassovary

Cassovary is a simple big graph processing library for the JVM
http://twitter.com/cassovary
Apache License 2.0
1.05k stars 150 forks source link

Fastutils wrapping benchmark #142

Closed szymonm closed 9 years ago

szymonm commented 9 years ago

When I run the code wrapped version of fastutils collection performs 50% worse than not wrapped:

time of pure map ops: 1862
time of wrapped map ops: 2678

This is possibly due to boxing.

pankajgupta commented 9 years ago

I verified on my computer as well, and it is indeed 50% slower on mine too...

pankajgupta commented 9 years ago

Interestingly, I get the same results when I do val easyMap = (new Int2IntOpenHashMap()).asInstanceOf[java.util.Map[Int, Int]]

So it is not the scala wrapping, but just the boxing/unboxing at java level.

Any suggestions, @szymonm ? esp. in the light of #139 where also we want to rethink InfoKeeper.

szymonm commented 9 years ago

Yes, looks like using Int2IntOpenHashMap as Map[Int, Int] makes the methods boxed.

Right, we have go back with InfoKeepers.

And hope for efficient specialization in Scala one day.

szymonm commented 9 years ago

I'm afraid that this means whenever we use:

val nodes: Seq[Int] = new Array[Int](100)

then we have boxing...

szymonm commented 9 years ago
scala> :paste
// Entering paste mode (ctrl-D to finish)

def time(b : => Unit) {
  val start = java.lang.System.currentTimeMillis()
  b
  print(s"Time: ${java.lang.System.currentTimeMillis() - start}")
}

// Exiting paste mode, now interpreting.

time: (b: => Unit)Unit

scala> import scala.util.Random
import scala.util.Random

scala> val arr = new Array[Int](1000000)
scala> (0 until 1000000).foreach{ i => arr(i) = i }

scala> var b = 0
b: Int = 0

scala> time {(0 until 1000000).foreach{ i => b = arr(i) * 7 }}
Time: 7
scala> val seq: Seq[Int] = arr
scala> time {(0 until 1000000).foreach{ i => b = seq(i) * 7 }}
Time: 19
szymonm commented 9 years ago

@pankajgupta please have a look and check it too

pankajgupta commented 9 years ago

Interesting indeed.

In general, in cassovary the desire has been to be first and foremost efficient on storage (so native arrays) and then be as performant as possible without getting too low level. This clearly shows that there are opportunities to improve speed.

On Mon, Feb 9, 2015 at 2:38 PM, Szymon notifications@github.com wrote:

@pankajgupta https://github.com/pankajgupta please have a look

— Reply to this email directly or view it on GitHub https://github.com/twitter/cassovary/pull/142#issuecomment-73606127.