rajasekarv / vega

A new arguably faster implementation of Apache Spark from scratch in Rust
Apache License 2.0
2.23k stars 207 forks source link

RDD API - random_split() #106

Closed ugoa closed 4 years ago

ugoa commented 4 years ago
  1. Implement https://github.com/rajasekarv/native_spark/issues/88
  2. Add documentation for the implemented APIs.
  3. Minor refactoring of the RDD::key_by()
ugoa commented 4 years ago

Please don't merge in this PR yet 😅, my implementation is not completely correct and need some more work to fix it. Will change the status when it done. Thanks.

ugoa commented 4 years ago

Finally this API does what it is expected, a bit messy though. Welcome to review and provide feedback, thanks!

iduartgomez commented 4 years ago

Last changes look good, but making an additional change so we return an iterator from the sampler. Will do a PR to your branch ASAP.

ugoa commented 4 years ago

@iduartgomez Sure, appreciate it.

ugoa commented 4 years ago

@iduartgomez Thanks. Have merged in your changes. Should be safe to merge into master and close this PR now. Let me know if any further changes required. Cheers.

rajasekarv commented 4 years ago

thanks @ugoa @iduartgomez