Closed juanrh closed 9 years ago
See https://gist.github.com/juanrh/c719ca82f441f8dacd59#file-testcasedictinputdstream-scala for preliminary version of custom InputDStream that allows dynamic queueing of DStreams. This exploits the fact that the InputDStream is running at the driver, in the style of the Kafka direct API. On the other hand, note this design is quite bad as it mixes the receiver with the test case manipulation, which result in unnecessary coupling, and a not very reusable InputDStream. Also tracking the origin of the test cases leads to using dictionaries, instead of simpler List, which are more efficient for the full scans that we have to perform here
First PoC respecting batch limits stored as a lightweight tag at https://github.com/juanrh/sscheck/releases/tag/directStreamingBatchesOkPoC