radanalyticsio / silex

something to help you spark
Apache License 2.0
65 stars 13 forks source link

Feature/iid implement 'iid' sampling of feature vectors from RDDs #24

Closed erikerlandson closed 9 years ago

erikerlandson commented 9 years ago

Note, this is written on top of the outstanding PR #10 for feature extraction combinators

erikerlandson commented 9 years ago

Rebased off of the latest develop branch with Extractor PR

erikerlandson commented 9 years ago

Possibly worth considering an alternative to 'iid', because synthetic data is independently distributed, but won't in general be identically distributed. Not sure what a good replacement name would be.

erikerlandson commented 9 years ago

Maybe mdsFeatureSeqRDD, mds is for "Marginal Distribution Sampling"