returneksibir / yakala

a humble web crawler framework
3 stars 1 forks source link

Set experiments #21

Open sardok opened 13 years ago

sardok commented 13 years ago

used ptime function;

def ptime[A](f: => A) = { val t0 = System.nanoTime val ans = f printf("Elapsed: %.3f msec\n",(System.nanoTime-t0)*1e-6) ans }

execution used to populate Set()

val rnd = new Random() ptime((1 to 1000000) foreach { num => a += (num + " qweasda sdads ads " + rnd.nextString(20)); println("a.size " + a.size)})

mutable Set[String]

1000000 Elapsed: 31788.357 msec

a.size 1200001 Elapsed: 10817.556 msec

a.size 1400002 Elapsed: 23297.449 msec

a.size stucked at 1484389 and slowed very much.

variable immutable Set[String]

a.size 1000000 Elapsed: 35784.131 msec

a.size 1151086 java.lang.OutOfMemoryError: GC overhead limit exceeded

mutable HashSet[String]

a.size 1000000 Elapsed: 31721.773 msec

a.size 1200000 Elapsed: 10963.589 msec

a.size 1400000 Elapsed: 19112.277 msec

at a.size 1552035, slowed very much

2 different mutable Set[String]

use add method instead of '+=' operator.

a.size 1000000 Elapsed: 31669.254 msec

a.size 1200000 Elapsed: 10385.644 msec

a.size 1400000 Elapsed: 25776.383 msec

a.size 1471899 java.lang.OutOfMemoryError: GC overhead limit exceeded

test with two different mutable Set[String] ptime((1 to 1000000) foreach { num => val payload = (num + " qweasda sdads ads " + rnd.nextString(20)); a+= payload;b+=payload; println("a.size " + a.size + ", b.size " + b.size)})

a.size 1000000, b.size 1000000 Elapsed: 63678.129 msec

rimbi commented 13 years ago

yani? :)

rimbi commented 12 years ago

Hi Sinan, According to my recent analysis the reason of slowness is not due to the bad performance of Set. Set has rather a superb performance even with 7 millions of items. The problem is insufficient heap. So, I think we can close and postpone the discussions/issues related to Set for a while.