juanrh / sscheck

ScalaCheck for Spark
Apache License 2.0
63 stars 9 forks source link

Support shrinking in generators for dstreams #10

Open juanrh opened 9 years ago

juanrh commented 9 years ago

basic shrinking support is available for BatchGen

juanrh commented 9 years ago

For PDStreamGen a trivial generator like the following makes no sense for temporal generators:

implicit def shrinkDStream[A] : Shrink[PDStream[A]] = Shrink(pdstream => 
  // unwrap the underlying Seq, shrink the Seq, and rewrap
  shrink(pdstream.toSeq).map(PDStream(_:_*))
)

The problem is that given a PDStream[A] value we have no way to know which temporal operator was used to generate it, or with which timeout, so if we shrink like with a Seq we get a nonsense behaviour. It could make sense to write a generator that respects the different occurrences of the different batches, the idea is given a PDStream ds with a batch b with n occurrences of b, ie ds = C[b,b,b] corresponding to a 3 hole context with that 3 occurences of b its only ocurrences, then we might shink b into b' and then shrink ds as ds' = C[b',b',b']. That can be performed for all the different batches, thus obtaining a shrunk version of ds that has the same temporal structure. This reasoning should be double checked