Open karlhigley opened 8 years ago
Once Spark 1.6 is released, it might be better to move directly to the Datasets API, instead of transitioning twice. Will have to evaluate whether or not that API is sufficiently feature complete to support the required operations.
Using the Dataframes API instead of using RDDs directly may provide a speed improvement through the use of the Catalyst optimizer.