scala / bug

Scala 2 bug reports only. Please, no questions — proper bug reports only.
https://scala-lang.org
232 stars 21 forks source link

partest CPU utilization declines thus making ant test.suite take too long #5955

Closed scabug closed 12 years ago

scabug commented 12 years ago

The snapshots below depict system load while running ant test.suite. Although coarse, they suggest that over half of the test.suite run is I/O bound, for the number of tasks running in parallel, unlike the first quarter or so which is CPU bound.

The data:

Perhaps we can get shorter wall-clock time by running two partests in parallel (don't laugh), mixing I/O bound and CPU bound loads. Additionally, spawning 8 parallel tasks underutilizes some cores on some machines.

In any case, any solution which doesn't require meddling with partest internals seems better than the alternatives.

scabug commented 12 years ago

Imported From: https://issues.scala-lang.org/browse/SI-5955?orig=1 Reporter: @magarciaEPFL Attachments:

scabug commented 12 years ago

@axel22 said: On snapshot 1, the bump looks like the time when pos test group stops and neg test group starts. (maybe)

scabug commented 12 years ago

@axel22 said: It would be nice if the graphs showed IO activity too.

However, I don't think that the underutilized CPU time is due to IO. My hypothesis is that when a test group ends, there are a few tests which take longer than the others.

Besides, aren't the tests from one test group in partest equally divided among several actors and then each of these actors runs all of the tests in its group?

If that is the case, then the workload is not evenly divided between processors - there should be a central work item queue from which the actors take tests one-by-one. If the uneven workload is really the cause, this should improve utilization and testing time.

scabug commented 12 years ago

@axel22 said: There you go:

DirectRunner.scala:

    val workers = kindFiles.grouped(groupSize).toList map { toTest =>
      val worker = new Worker(fileManager, TestRunParams(scalaCheckParentClassLoader))
      worker.start()
      worker ! RunTests(kind, toTest)
      worker
    }

Worker.scala:

  def act() {
    react {
      case RunTests(testKind, files) =>
        val master = sender
        kind = testKind
        runTests(files) { results =>
          master ! Results(results.toMap)
          resetAll()
        }
    }
  }
scabug commented 12 years ago

@magarciaEPFL said (edited on Jun 21, 2012 1:07:04 PM UTC): Yes, I/O can't be the main explanation for the low CPU utilization, workload division is more likely. Although classloading (unpickling, reflection) and classwriting (GenJVM, less so for GenASM) take time.

Regarding even division of the workload:

Another observation: run partest --run and jvisualvm. Do you see those Scala logos on the top left? Those are tests running in parallel. Select "Show live threads only" to keep the timeline uncluttered. Except at the beginning, there are about 5 logos at any given time. That's too low, even on an 8-core machine.

!jvisualvm.png!

scabug commented 12 years ago

@axel22 said:

  1. Yes, those are workers threads created by the forkjoin pool which is in turn used by the actors.

  2. Not sure, if --verbose option were to print out when a Worker actor above ends its work, we'd know in the output exactly when each of the actors ends and if there are some actors which take a bit longer. We can try this out easily - NestUI.verbose("actor at: " + Thread.currentThread + " done with test group") at the end of the act method in Worker.scala, and running test/partest with --verbose.

Running partest and visualvm now.

scabug commented 12 years ago

@axel22 said: I usually have 3-7 tests in parallel, on average 4 tests at a time. I have a 4-core machine with hyperthreading, so I think this makes sense on my workstation.

Some tests take much longer than the others.

scabug commented 12 years ago

@axel22 said: There is one difference between run and pos. pos test group never starts a new JVM instance. Perhaps starting processes is what causes such a CPU underutilization.

scabug commented 12 years ago

@magarciaEPFL said: Yes, that explains why pos and neg stay at 100% CPU utilization.

scabug commented 12 years ago

@axel22 said:

  1. I think the granularity argument goes in favour of pos and neg too - I think each file here is big enough to be taken as a task from a single queue and not even notice synchronization impact.
scabug commented 12 years ago

@magarciaEPFL said: I'm trying to increase the number of worker threads, like this:

./partest -Dactors.corePoolSize=32 -Dpartest.actors=32 -Dactors.maxPoolSize=64 -Dactors.enableForkJoin=true --run --debug

Output:

Info: actors.corePoolSize not defined

Testing interpreter and backend
Info: initializing scala.actors.Scheduler$@10d09ad3...
Info: scala.actors.scheduler.ThreadPoolConfig$@250d593e: java.version = 1.6.0_30
Info: scala.actors.scheduler.ThreadPoolConfig$@250d593e: java.vm.vendor = Sun Microsystems Inc.
Info: scala.actors.scheduler.ForkJoinScheduler@4026e9f9: parallelism 12
Info: scala.actors.Scheduler$@10d09ad3: 
      starting new scala.actors.scheduler.ForkJoinScheduler@4026e9f9 [class scala.actors.scheduler.ForkJoinScheduler]

Highlights:

scabug commented 12 years ago

@magarciaEPFL said: It's strange. When running NestRunner all by itself CPU usage stays all the time close to 100% , unlike when running ant test.suite.

A few data points (8-core machine, no warm-up, no averaging over multiple runs, relative times are what counts):

Testing interpreter and backend Info: initializing scala.actors.Scheduler$@351563ff... Info: Thread[Thread-1,5,main]: corePoolSize = 12, maxPoolSize = 256 Info: scala.actors.Scheduler$@351563ff: starting new Thread[Thread-1,5,main] [class scala.actors.scheduler.ResizableThreadPoolScheduler]


Background info, quoting from `scala.actors.Actor`

Summing up:

|| NestRunner --run ||elapsed time|| | ForkJoinScheduler, parallelism == 12 | 13 minutes | | ForkJoinScheduler, parallelism == 32 | 16 minutes | | ResizableThreadPoolScheduler | 13 minutes |

scabug commented 12 years ago

@magarciaEPFL said: The following bash script keeps CPU utilization close to 100% (except at the very end) shaving 5 minutes off the test suite on an 8-core machine:

./partest --pos --neg --jvm --res --buildmanager --scalacheck --script --shootout --presentation & 
./partest --run
wait   # for subshells to finish.

With 16 cores a viable partitioning would be:

scabug commented 12 years ago

@paulp said: I hadn't seen this ticket. You guys might want to know that at some point I ripped the actors out of partest and deleted the pointless layers of indirection. Like miguel says,

    Test suite time on my 8-core machine goes from 28:21 to 22:34.

I don't remember the condition of this branch but I pushed it in case anyone is interested.

https://github.com/paulp/scala/tree/topic/partest-no-actors

scabug commented 12 years ago

@paulp said: Now I've refreshed my memory by building that branch. All the tests pass, and the test suite ran 7 minutes faster than the current test suite ever runs for me. The only issue is a large output spew from the scalacheck tests. I got sidetracked trying to change partest to just use scalacheck like a regular person rather than playing the ridiculous reflection dance, and when that was rejected because somebody in east timor uses partest outside the compiler and doesn't want an actual dependency on scalacheck, I promptly forgot the matter entirely.

So alex, since you're participating in this discussion, and I think the scalacheck tests are mostly yours, maybe you would like to take a peek at it...

scabug commented 12 years ago

@paulp said: Maybe THIS will fill you with excitement and new motivation:

BUILD SUCCESSFUL
Total time: 23 minutes 36 seconds
scabug commented 12 years ago

@axel22 said (edited on Jul 10, 2012 12:59:15 PM UTC): Saw your changes in that branch, they make sense definitely. Will build and run the thing now to figure out what this scalacheck output is all about.

Apparently, Som Snytt is volunteering to take a look at this.

scabug commented 12 years ago

@axel22 said: I see what you mean, I'll check this out.

scabug commented 12 years ago

@magarciaEPFL said: PR discussion at https://github.com/scala/scala/pull/870

Merged in e5c3d78b5d25a4d5356bf58cd5d3f1d41b55ee12