juanrh / sscheck

ScalaCheck for Spark
Apache License 2.0
63 stars 9 forks source link

Compare performance of alternative implementation for Streaming test case generation based on sequential multiplexing #5

Open juanrh opened 9 years ago

juanrh commented 9 years ago

Try an alternative implementation based on a single worker for the Prop, that multiplexes test cases with a list like in es.ucm.fdi.sscheck.spark.streaming.StreamingContextActorReceiverTest of branch streamingDataSendExperiments (commit 5d7d7f55c828957a3673e5bf02dcfedef14e9468). The idea is that we are already sequentializing assertion check, and using ids for the test cases. We could fail on the test case we detected. Note for ScalaCheck we probably will be running a later test case than the one generating the counterexample. So the idea is to replace the message of the Specs2 Result generated by the Prop, to report the actual counterexample. Note shrinking is disabled anyway in the current implementation

The memory consumption is more or less the same, as we are storing in one thread the test cases that are otherwise stored along several threads. The gain is in the lack of synchronization because there is no concurrency. Also there was no gain in concurrency here because all the test cases had to be sent to a single receiver anyway, so we just perform a particular serialization of what is executed concurrently in the main branch. The concurrent approach would get some gain if we had several Spark receivers accepting data in parallel: it is not clear that that is worth in a first approach with focus on functional validation The code gains simplicity because there is no concurrency, but gains the complexity of handling the multiplexed test cases

This should be implemented in an additional subtype of the original trait for using scalacheck in spark streaming, allowing to switch implementations. This might imply some refactoring in the original classes