juanrh / sscheck

ScalaCheck for Spark
Apache License 2.0
63 stars 9 forks source link

PoC of raw scalacheck propery for Spark Streaming #6

Closed juanrh closed 9 years ago

juanrh commented 9 years ago
juanrh commented 9 years ago

See https://gist.github.com/juanrh/c719ca82f441f8dacd59#file-testcasedictinputdstream-scala for preliminary version of custom InputDStream that allows dynamic queueing of DStreams. This exploits the fact that the InputDStream is running at the driver, in the style of the Kafka direct API. On the other hand, note this design is quite bad as it mixes the receiver with the test case manipulation, which result in unnecessary coupling, and a not very reusable InputDStream. Also tracking the origin of the test cases leads to using dictionaries, instead of simpler List, which are more efficient for the full scans that we have to perform here

juanrh commented 9 years ago

First PoC respecting batch limits stored as a lightweight tag at https://github.com/juanrh/sscheck/releases/tag/directStreamingBatchesOkPoC