Using the same cluster resources, we need a graph showing the performance of our algorithm as the size of the underlying dataset increases. In theory, this should be a linear slowdown.
We could also do a sub-experiment here that varies the number of RDD partitions and observes the performance (holding the cluster resources and the data size constant).
Using the same cluster resources, we need a graph showing the performance of our algorithm as the size of the underlying dataset increases. In theory, this should be a linear slowdown.
We could also do a sub-experiment here that varies the number of RDD partitions and observes the performance (holding the cluster resources and the data size constant).