We need to examine Spark's DataFrame API as a possible alternative for representing our data (beyond RDDs). DataFrames are structured abstractions; as such, Spark understands the schema prior to execution and can therefore optimize the underlying binary representation to its furthest extent.
We need to examine Spark's DataFrame API as a possible alternative for representing our data (beyond RDDs). DataFrames are structured abstractions; as such, Spark understands the schema prior to execution and can therefore optimize the underlying binary representation to its furthest extent.
http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame