Use Spark's Dataframes API

karlhigley / lexrank-summarizer

A Spark-based LexRank extractive summarizer for text documents

MIT License

19 stars 4 forks source link

Use Spark's Dataframes API #42

Open karlhigley opened 8 years ago

karlhigley commented 8 years ago

Using the Dataframes API instead of using RDDs directly may provide a speed improvement through the use of the Catalyst optimizer.

karlhigley commented 8 years ago

Once Spark 1.6 is released, it might be better to move directly to the Datasets API, instead of transitioning twice. Will have to evaluate whether or not that API is sufficiently feature complete to support the required operations.