Closed scabug closed 12 years ago
Imported From: https://issues.scala-lang.org/browse/SI-5955?orig=1 Reporter: @magarciaEPFL Attachments:
@axel22 said: On snapshot 1, the bump looks like the time when pos test group stops and neg test group starts. (maybe)
@axel22 said: It would be nice if the graphs showed IO activity too.
However, I don't think that the underutilized CPU time is due to IO. My hypothesis is that when a test group ends, there are a few tests which take longer than the others.
Besides, aren't the tests from one test group in partest equally divided among several actors and then each of these actors runs all of the tests in its group?
If that is the case, then the workload is not evenly divided between processors - there should be a central work item queue from which the actors take tests one-by-one. If the uneven workload is really the cause, this should improve utilization and testing time.
@axel22 said: There you go:
DirectRunner.scala:
val workers = kindFiles.grouped(groupSize).toList map { toTest =>
val worker = new Worker(fileManager, TestRunParams(scalaCheckParentClassLoader))
worker.start()
worker ! RunTests(kind, toTest)
worker
}
Worker.scala:
def act() {
react {
case RunTests(testKind, files) =>
val master = sender
kind = testKind
runTests(files) { results =>
master ! Results(results.toMap)
resetAll()
}
}
}
@magarciaEPFL said (edited on Jun 21, 2012 1:07:04 PM UTC): Yes, I/O can't be the main explanation for the low CPU utilization, workload division is more likely. Although classloading (unpickling, reflection) and classwriting (GenJVM, less so for GenASM) take time.
Regarding even division of the workload:
ForkjoinPool-1-worker-1
to ForkjoinPool-1-worker-12
.
Are they partest's actors?Another observation: run partest --run
and jvisualvm. Do you see those Scala logos on the top left?
Those are tests running in parallel. Select "Show live threads only" to keep the timeline uncluttered.
Except at the beginning, there are about 5 logos at any given time. That's too low, even on an 8-core machine.
!jvisualvm.png!
@axel22 said:
Yes, those are workers threads created by the forkjoin pool which is in turn used by the actors.
Not sure, if --verbose
option were to print out when a Worker
actor above ends its work, we'd know in the output exactly when each of the actors ends and if there are some actors which take a bit longer. We can try this out easily - NestUI.verbose("actor at: " + Thread.currentThread + " done with test group")
at the end of the act
method in Worker.scala, and running test/partest with --verbose.
Running partest and visualvm now.
@axel22 said: I usually have 3-7 tests in parallel, on average 4 tests at a time. I have a 4-core machine with hyperthreading, so I think this makes sense on my workstation.
Some tests take much longer than the others.
@axel22 said:
There is one difference between run
and pos
. pos
test group never starts a new JVM instance. Perhaps starting processes is what causes such a CPU underutilization.
@magarciaEPFL said:
Yes, that explains why pos
and neg
stay at 100% CPU utilization.
pos
and neg
) is such that they could be taken one at a time from a queue (a single queue actually). Together with the above we would get optimal throughput then.@axel22 said:
pos
and neg
too - I think each file here is big enough to be taken as a task from a single queue and not even notice synchronization impact.@magarciaEPFL said: I'm trying to increase the number of worker threads, like this:
./partest -Dactors.corePoolSize=32 -Dpartest.actors=32 -Dactors.maxPoolSize=64 -Dactors.enableForkJoin=true --run --debug
Output:
Info: actors.corePoolSize not defined
Testing interpreter and backend
Info: initializing scala.actors.Scheduler$@10d09ad3...
Info: scala.actors.scheduler.ThreadPoolConfig$@250d593e: java.version = 1.6.0_30
Info: scala.actors.scheduler.ThreadPoolConfig$@250d593e: java.vm.vendor = Sun Microsystems Inc.
Info: scala.actors.scheduler.ForkJoinScheduler@4026e9f9: parallelism 12
Info: scala.actors.Scheduler$@10d09ad3:
starting new scala.actors.scheduler.ForkJoinScheduler@4026e9f9 [class scala.actors.scheduler.ForkJoinScheduler]
Highlights:
actors.corePoolSize
be set?ForkJoinScheduler
's parallelism be raised? (in the snippet above, 12
).@magarciaEPFL said:
It's strange. When running NestRunner
all by itself CPU usage stays all the time close to 100% , unlike when running ant test.suite
.
A few data points (8-core machine, no warm-up, no averaging over multiple runs, relative times are what counts):
partest --run
results in
All of 986 tests were successful (elapsed time: 00:13:13)
with 4 GB peak memory use.
scala.tools.partest.nest.NestRunner --run
with
-Dactors.corePoolSize=32 -Dpartest.actors=32 -Dactors.maxPoolSize=64 -Dactors.enableForkJoin=true
results in
All of 986 tests were successful (elapsed time: 00:15:56)
with 5 GB peak memory use.
NestRunner --run
with only -Dactors.enableForkJoin=false
results in
All of 986 tests were successful (elapsed time: 00:13:09)
with 6 GB peak memory use. Regarding pool size, --debug
informs us that:
Info: actors.corePoolSize not defined
Testing interpreter and backend Info: initializing scala.actors.Scheduler$@351563ff... Info: Thread[Thread-1,5,main]: corePoolSize = 12, maxPoolSize = 256 Info: scala.actors.Scheduler$@351563ff: starting new Thread[Thread-1,5,main] [class scala.actors.scheduler.ResizableThreadPoolScheduler]
Background info, quoting from `scala.actors.Actor`
actors.corePoolSize
JVM property).scheduler
method of the Actor
trait can be overridden to return aResizableThreadPoolScheduler
, which resizes its thread pool toactors.enableForkJoin
JVM property can be set to false
, in whichResizableThreadPoolScheduler
is used by default to execute actors.
Summing up:
|| NestRunner --run
||elapsed time||
| ForkJoinScheduler, parallelism == 12 | 13 minutes |
| ForkJoinScheduler, parallelism == 32 | 16 minutes |
| ResizableThreadPoolScheduler | 13 minutes |
@magarciaEPFL said: The following bash script keeps CPU utilization close to 100% (except at the very end) shaving 5 minutes off the test suite on an 8-core machine:
./partest --pos --neg --jvm --res --buildmanager --scalacheck --script --shootout --presentation &
./partest --run
wait # for subshells to finish.
With 16 cores a viable partitioning would be:
--pos --neg
--jvm --res --buildmanager --scalacheck --script --shootout --presentation
--run
@paulp said: I hadn't seen this ticket. You guys might want to know that at some point I ripped the actors out of partest and deleted the pointless layers of indirection. Like miguel says,
Test suite time on my 8-core machine goes from 28:21 to 22:34.
I don't remember the condition of this branch but I pushed it in case anyone is interested.
@paulp said: Now I've refreshed my memory by building that branch. All the tests pass, and the test suite ran 7 minutes faster than the current test suite ever runs for me. The only issue is a large output spew from the scalacheck tests. I got sidetracked trying to change partest to just use scalacheck like a regular person rather than playing the ridiculous reflection dance, and when that was rejected because somebody in east timor uses partest outside the compiler and doesn't want an actual dependency on scalacheck, I promptly forgot the matter entirely.
So alex, since you're participating in this discussion, and I think the scalacheck tests are mostly yours, maybe you would like to take a peek at it...
@paulp said: Maybe THIS will fill you with excitement and new motivation:
BUILD SUCCESSFUL
Total time: 23 minutes 36 seconds
@axel22 said (edited on Jul 10, 2012 12:59:15 PM UTC): Saw your changes in that branch, they make sense definitely. Will build and run the thing now to figure out what this scalacheck output is all about.
Apparently, Som Snytt is volunteering to take a look at this.
@axel22 said: I see what you mean, I'll check this out.
@magarciaEPFL said: PR discussion at https://github.com/scala/scala/pull/870
Merged in e5c3d78b5d25a4d5356bf58cd5d3f1d41b55ee12
The snapshots below depict system load while running
ant test.suite
. Although coarse, they suggest that over half of the test.suite run is I/O bound, for the number of tasks running in parallel, unlike the first quarter or so which is CPU bound.The data:
files/pos
andfiles/neg
tests.files/run
tests are being run, the CPUs have little to do during the second half of that interval.files/jvm
tests start, then stays at values below 20%. With thebuildmanager
tests it recovers again, followed by another decline (the decline lasts longer than the recovery). Same pattern with thescalacheck
testsfiles/scalap
,files/specialized
, andfiles/presentation
tests run at 50% utilization.Perhaps we can get shorter wall-clock time by running two partests in parallel (don't laugh), mixing I/O bound and CPU bound loads. Additionally, spawning 8 parallel tasks underutilizes some cores on some machines.
In any case, any solution which doesn't require meddling with partest internals seems better than the alternatives.