Parallel evaluation of assertions in streaming properties

juanrh / sscheck

ScalaCheck for Spark

Apache License 2.0

63 stars 9 forks source link

Parallel evaluation of assertions in streaming properties #8

Open juanrh opened 9 years ago

juanrh commented 9 years ago

We have a foreachRDD that checks the actions for each parallel test case. Instead of a for we could use a map to Future to execute the assertions in parallel in the cores of the driver node. Note the actions are executed in parallel in the workers in any case, this would only affect the processing of the results of the actions as assertions. We want to fail for only one test case, but we don't care about the order. We could use par(), or Future.traverse() to obtain a list of futures and inspect them until the first failure is found or all the results are computed. Check http://lancegatlin.org/tech/scala-collections-of-futures

juanrh commented 9 years ago

see check() at https://github.com/rickynils/scalacheck/blob/1.12.2/src/main/scala/org/scalacheck/Test.scala, in particular

import concurrent._
        val tp = java.util.concurrent.Executors.newFixedThreadPool(workers)
        implicit val ec = ExecutionContext.fromExecutor(tp)
        try {
          val fs = List.range(0,workers) map (idx => Future {
            params.customClassLoader.map(
              Thread.currentThread.setContextClassLoader(_)
            )
            blocking { workerFun(idx) }
          })
          val zeroRes = Result(Passed,0,0,FreqMap.empty[Set[Any]],0)
          val res = Future.fold(fs)(zeroRes)(mergeResults)
          Await.result(res, concurrent.duration.Duration.Inf)

juanrh commented 9 years ago

Now using par in consume method of NextAnd and NextOr, this improved performance dramatically. Note consume is invoked in DStreamProp.forAll at a foreachRDD, thus in the driver. That could imply parallel submission of Spark actions. Study whether we should use a GenSeq in NextAnd and NextOr to store parallel collection instead of sequences that are parallelized at each call to consume; also study using par in other NextFormula classes